Examples of proper utilization of SAN
- 
 @DustinB3403 said in Examples of proper utilization of SAN: Large scale out storage is the only logical use. Storage way above what could be fit in a single server. SAN is scale up, not scale out. 
- 
 @Dashrender said in Examples of proper utilization of SAN: @DustinB3403 said in Examples of proper utilization of SAN: @davide-bonavita said in Examples of proper utilization of SAN: We deployed a starwind vSAN in HA to store some critical VMs, it works quite well (cfr. "StarWind Virtual SAN  
 Installation and Configuration of HyperConverged 2 Nodes with Hyper-V Cluster" technical paper)While that is an good example of uses to deploy a SAN solution, I think @EddieJennings is referring to physical SAN products and not the logical vSAN solutions that we know about today. OK that brings up a good point - is physical SANs even really worth it much anymore today considering the abilities of vSANs? I mean I'm sure there are times where it can be worthwhile - but likely not for anyone really hanging out on these forums. vSANs were there first. Physical SAN came later, by definition. We get all weird when talking about SANs, but in the real world, everything is software first and appliance later. NAS is an appliance of a file server. SAN is an appliance of a block storage server. SAN became "so famous" and so treated as a magic black box, that people had to go back and rename the original product a vSAN so that people would know what it was. Would be the same as calling a normal file server a vNAS today. Sounds stupid, but that's how stupid vSAN is. We've never had a time when vSAN wasn't everywhere. 
- 
 @Pete-S said in Examples of proper utilization of SAN: If you look at it a SAN is very desirable when you're running large databases, typical of the enterprise, because you need low latency block storage. Actually that's where you avoid it. Specifically for that reason. Because for databases the additional latency and risk of the SAN doesn't normally make sense. That's why in high performance databases were one of the first places to abandon SAN because they needed something faster. Remember, SAN is the slow option, not the fast one. Simple physics says that a SAN has to be slower than its local equivalent. Maybe not a lot, but it's physically impossible for it to be as fast or faster. It has to be at least a tiny bit slower. SAN is always chosen despite performance losses. 
- 
 @DustinB3403 said in Examples of proper utilization of SAN: @Pete-S said in Examples of proper utilization of SAN: If you look at it a SAN is very desirable when you're running large databases, typical of the enterprise The part quoted is the only bit that makes sense. Not really, because databases tend to need speed and reliability and they all replicate at the application layer and can't be replicated blindly at the storage layer nor is there a real use case of multiple RDBMS seeing a single pool of storage - the locking problems would be terrible for performance. So for myriad reasons, we would expect databases to be among the worst use cases for SAN. And in the real world, that's exactly where we saw SANs avoided first. Databases were where we exposed the needs for local storage options first. Of course, the problem is, most places just throw money at solutions until even the worst option works. Then people who see that working assume it was a good choice, instead of a bad one, because they see it "in use" rather than see the evaluation, cost and risk involved. 
- 
 @Pete-S said in Examples of proper utilization of SAN: So it's implied that if we are talking about SAN we are talking about shared block storage - meaning local storage is out. Local storage is the best way to share block storage. In no way whatsoever does needing shared block imply that a SAN is a need. https://smbitjournal.com/2013/07/replicated-local-storage/ Much of the worst runs on shared local. Some do it by using block protocols like a vSAN, some do it without through things like Gluster or SCRIBE. But RLS is the best way to get high performance shared block, if you need shared. 
- 
 @Pete-S said in Examples of proper utilization of SAN: @DustinB3403 said in Examples of proper utilization of SAN: The only reasonable use case for SAN is with massive scale out storage requirements. Wrong. Low latency shared block storage for OLTP applications don't have to be massive to make sense. Just need high performance requirements. Also, for instance a HPC cluster might fit in one rack but need a high performance storage solution. As Travis had pointed out, SAN guarantees more latency, not less. So any low latency requirements make SAN less desirable, not more. Your belief that SAN provides more performance than local storage does is causing you to think SAN would solve problems where it doesn't (or doesn't as well as more obvious solutions.) This is basic computing physics.... the same storage local or distant has to be faster when local. Maybe not much faster, but it can't be slower or the same. There is just less latency - less wire, fewer hops. Using SAN implies only two things.... distant, and block. Anything else assumed about SAN is just incorrect, it's not part of SAN. How most people approach it is that they assume their SAN is crazy expensive and their local is cheap and then use that to show that SAN is "faster" by comparing apples and oranges. But the same NVMe drive local vs. hooked to a separate server and shared over even NVMeoF is a tiny bit slower. 
- 
 @scottalanmiller said in Examples of proper utilization of SAN: @Pete-S said in Examples of proper utilization of SAN: @DustinB3403 said in Examples of proper utilization of SAN: The only reasonable use case for SAN is with massive scale out storage requirements. Wrong. Low latency shared block storage for OLTP applications don't have to be massive to make sense. Just need high performance requirements. Also, for instance a HPC cluster might fit in one rack but need a high performance storage solution. As Travis had pointed out, SAN guarantees more latency, not less. So any low latency requirements make SAN less desirable, not more. Your belief that SAN provides more performance than local storage does is causing you to think SAN would solve problems where it doesn't (or doesn't as well as more obvious solutions.) This is basic computing physics.... the same storage local or distant has to be faster when local. Maybe not much faster, but it can't be slower or the same. There is just less latency - less wire, fewer hops. Using SAN implies only two things.... distant, and block. Anything else assumed about SAN is just incorrect, it's not part of SAN. How most people approach it is that they assume their SAN is crazy expensive and their local is cheap and then use that to show that SAN is "faster" by comparing apples and oranges. But the same NVMe drive local vs. hooked to a separate server and shared over even NVMeoF is a tiny bit slower. You assumed I made assumptions I didn't make. Yes, local is always faster but local is not shared. So then it all becomes just a question how we share and access the data. If we put the shared storage on dedicated servers we have a SAN. If we put the storage on the same servers that we are running compute we have hyperconverged storage. In the first case we can optimize both hardware and software and it's only running this single task. In the second case we usually run both compute and storage on the same hardware. By pure logic the first case, the SAN, has to be the higher performing option. Looking at replicated local storage though, that implies that we can fit the storage on one server. Of course this is almost as fast as local storage (assuming synchronous replication). But it also means that SAN advantage of consolidating storage is lost. So in order of performance on equal hardware we have: - local storage
- replicated local storage
- SAN
- vSAN and similar
 Of course then we have local storage cache for vSAN solutions and other things to mess this up. Also of course in real life its the cost of it all that determines what is the best solution. When I said that the SAN is the low latency option for OLTP or HPC, it's compared to things like gluster or vSAN - as they are comparable when it comes to storage consolidation. But you need enough workloads and servers for it to make sense to consolidate. Consolidate increase utilization (lower cost) by sacrificing some performance and increasing complexity. 
- 
 @Pete-S said in Examples of proper utilization of SAN: Yes, local is always faster but local is not shared But local CAN be shared. SAN is not shared either, but CAN be shared. Both are both shared or not shared. 
- 
 @Pete-S said in Examples of proper utilization of SAN: In the first case we can optimize both hardware and software and it's only running this single task. In the second case we usually run both compute and storage on the same hardware. By pure logic the first case, the SAN, has to be the higher performing option. That is in no way logical. That actually is both incredibly unlikely due to logic, and totally not true in the real world. I have no idea what kind of logic would make you think that making them far away from each other would be fast and local slow based on "dedicated harware" when the resource needs of storage is so tiny that it's of no consequence in modern devices. 
- 
 @scottalanmiller said in Examples of proper utilization of SAN: @Pete-S said in Examples of proper utilization of SAN: Yes, local is always faster but local is not shared But local CAN be shared. SAN is not shared either, but CAN be shared. Both are both shared or not shared. Local or not local has to be a question of where the data is used. It's always local somewhere. 
- 
 @Pete-S said in Examples of proper utilization of SAN: Looking at replicated local storage though, that implies that we can fit the storage on one server. Of course this is almost as fast as local storage (assuming synchronous replication). But it also means that SAN advantage of consolidating storage is lost. RLS does imply that, yes. But SAN does, too. SAN and RLS both have the "fit it in one server" limitation. In almost all cases, you need SAN to replicate, too. So any replication overhead in 99.9% of cases is the same RLS vs SAN. If your SAN doesn't need to be replicated, then chances are your local storage does not. There are extreme cases where you need shared storage that isn't replicated for reliability where SAN has a consolidation advantage for lower criticality workloads. 
- 
 @Pete-S said in Examples of proper utilization of SAN: @scottalanmiller said in Examples of proper utilization of SAN: @Pete-S said in Examples of proper utilization of SAN: Yes, local is always faster but local is not shared But local CAN be shared. SAN is not shared either, but CAN be shared. Both are both shared or not shared. Local or not local has to be a question of where the data is used. It's always local somewhere. Yes, local to the computer or local to a remote dedicated storage server. RLS means that the data is LOCAL to multiple locations. 
- 
 @Pete-S said in Examples of proper utilization of SAN: So in order of performance on equal hardware we have: local storage 
 replicated local storage
 SAN
 vSAN and similarI would not break it down that way. RLS can be as fast as any other local storage, if you don't require full sync. Async is an option and can have no performance overhead. vSAN is not slower than SAN. A SAN and vSAN are the same speed. And in the real world, since vSAN options are more flexible, they are actually faster. 
- 
 @Pete-S said in Examples of proper utilization of SAN: When I said that the SAN is the low latency option for OLTP or HPC, it's compared to things like gluster or vSAN - as they are comparable when it comes to storage consolidation I see. The problem with that is that Gluster is specifically a super slow mechanism. It's not that it is RLS that makes it slow, it's the Gluster mechanism itself. It's just a solution not designed around speed. So yes, it's slow. But the Netgear SC101 SAN is way slower, even though it is a real SAN on dedicated hardware. If we look at the faster SAN and RLS options, like a top end EMC vs. a Starwind vSAN then we are at blinding speeds in both cases with, I believe, the Starwind pulling ahead with things like NVMeoF that is pretty much as fast as things get, and RAM cache replication over RDMA on 100Mb/s Infiniband. 
- 
 @scottalanmiller said in Examples of proper utilization of SAN: @Pete-S said in Examples of proper utilization of SAN: When I said that the SAN is the low latency option for OLTP or HPC, it's compared to things like gluster or vSAN - as they are comparable when it comes to storage consolidation I see. The problem with that is that Gluster is specifically a super slow mechanism. It's not that it is RLS that makes it slow, it's the Gluster mechanism itself. It's just a solution not designed around speed. So yes, it's slow. But the Netgear SC101 SAN is way slower, even though it is a real SAN on dedicated hardware. If we look at the faster SAN and RLS options, like a top end EMC vs. a Starwind vSAN then we are at blinding speeds in both cases with, I believe, the Starwind pulling ahead with things like NVMeoF that is pretty much as fast as things get, and RAM cache replication over RDMA on 100Mb/s Infiniband. But look at next-gen SANs like Pure Storage and Netapp's NVMe arrays also running NVMe over Fabric. I think they are well ahead. I don't know how Starwind vSAN can be run but if it's on a hypervisor it's severely limited by I/O congestion through the kernel. NVMe drives is causing problems that was of no concern whatsoever with spinners. Both KVM and Xen has made a lot of work to limit their I/O latency and use polling techniques now but it's still a problem. That's why you really need SR-IOV on NVMe drives so any VM can bypass the hypervisor and just have it's own kernel to slow things down. 
- 
 @Pete-S said in Examples of proper utilization of SAN: @scottalanmiller said in Examples of proper utilization of SAN: @Pete-S said in Examples of proper utilization of SAN: When I said that the SAN is the low latency option for OLTP or HPC, it's compared to things like gluster or vSAN - as they are comparable when it comes to storage consolidation I see. The problem with that is that Gluster is specifically a super slow mechanism. It's not that it is RLS that makes it slow, it's the Gluster mechanism itself. It's just a solution not designed around speed. So yes, it's slow. But the Netgear SC101 SAN is way slower, even though it is a real SAN on dedicated hardware. If we look at the faster SAN and RLS options, like a top end EMC vs. a Starwind vSAN then we are at blinding speeds in both cases with, I believe, the Starwind pulling ahead with things like NVMeoF that is pretty much as fast as things get, and RAM cache replication over RDMA on 100Mb/s Infiniband. But look at next-gen SANs like Pure Storage and Netapp's NVMe arrays also running NVMe over Fabric. I think they are well ahead. I don't know how Starwind vSAN can be run but if it's on a hypervisor it's severely limited by I/O congestion through the kernel. NVMe drives is causing problems that was of no concern whatsoever with spinners. Both KVM and Xen has made a lot of work to limit their I/O latency and use polling techniques now but it's still a problem. That's why you really need SR-IOV on NVMe drives so any VM can bypass the hypervisor and just have it's own kernel to slow things down. Anyway, I'm sure every vendor is working on making things faster all the time. There are just to many option to make blanket statements on anything without being specific. For any discussion like this you'd have to have some idea of exactly what you want to accomplish and what the budget is to determine if option A or option B is the best course of action. 
- 
 @scottalanmiller said in Examples of proper utilization of SAN: Just Google: When to Consider a SAN A voila, first hit. Reading that is what prompted me to make this thread. I am looking for examples of $enterprisesrunning$applicationswhich requires infrastructure that would necessitate the use of SAN.To efficiently leverage consolidation it is necessary to have scale and this is where SANs really shine – when scale but in capacity and, more importantly, in the number of attaching nodes become very large. SANs are best suited to large scale storage consolidation. This is their sweet spot and what makes them nearly ubiquitous in large enterprises and very rare in small ones. I'm trying to of examples of where this situation exists. Would this be a valid example? Take Vultr, who's providing VPS services. Because of the number of hosts hosting the VMs which are eventually used by customers for their various projects, the best way to present block storage to these hosts is from a SAN. 
- 
 @DustinB3403 said in Examples of proper utilization of SAN: @EddieJennings what conversation is going on that you're looking for more information regarding SAN (products I assume). Which SAN isn't something you can purchase, it's something you have to build. Been busy this morning, and haven't been able to follow the thread closely. I know that SAN is something that's built, and what I was looking for are real-world examples of infrastructures that utilize SAN and see what about their needs leads to the decision that building a SAN is necessary. Likely there isn't any kind of clear example of the process that leads to said decision, but at least it got folks talking, so I can go back in a bit, read what's there, and see if there's wisdom to gain :). 
- 
 @EddieJennings said in Examples of proper utilization of SAN: @DustinB3403 said in Examples of proper utilization of SAN: @EddieJennings what conversation is going on that you're looking for more information regarding SAN (products I assume). Which SAN isn't something you can purchase, it's something you have to build. Been busy this morning, and haven't been able to follow the thread closely. I know that SAN is something that's built, and what I was looking for are real-world examples of infrastructures that utilize SAN and see what about their needs leads to the decision that building a SAN is necessary. Likely there isn't any kind of clear example of the process that leads to said decision, but at least it got folks talking, so I can go back in a bit, read what's there, and see if there's wisdom to gain :). Good luck wading through all the straw-man arguments and examples lol. 
- 
 I don't know how Starwind vSAN can be run but if it's on a hypervisor it's severely limited by I/O congestion through the kernel. NVMe drives is causing problems that was of no concern whatsoever with spinners. Both KVM and Xen has made a lot of work to limit their I/O latency and use polling techniques now but it's still a problem. That's why you really need SR-IOV on NVMe drives so any VM can bypass the hypervisor and just have it's own kernel to slow things down. Anton: There are no problems with polling these days  You normally spawn a SPDK-enabled VM (Linux is unbeatable here as most of the new gen I/O development happens there) and pass thru RDMA-capable network hardware (virtual function with SR-IOV or whole card with PCIe pass-thru, this is really irrelevant...) and NMVe drives and... magic starts happening You normally spawn a SPDK-enabled VM (Linux is unbeatable here as most of the new gen I/O development happens there) and pass thru RDMA-capable network hardware (virtual function with SR-IOV or whole card with PCIe pass-thru, this is really irrelevant...) and NMVe drives and... magic starts happening This is how our NVMe-oF target works on ESXi & Hyper-V (KVM & Xen have no benefits here architecturally, this is where you're either wrong or I failed to get your arguments). It's possible to port SPDK into Windows user-mode but lack of NVMe and NIC polling drivers takes away all the fun: to move the same amount of data we normally use ~4x more CPU horsepower on "Pure Windows" Vs. "Linux-SPDK-VM-on-Windows" models. Microsoft is trying to bring SPDK to Windows kernel (so does VMware from what I know), but it needs a lot of work from NIC and NVMe engineers and... nobody wants to contribute. Really. This is how our NVMe-oF target works on ESXi & Hyper-V (KVM & Xen have no benefits here architecturally, this is where you're either wrong or I failed to get your arguments). It's possible to port SPDK into Windows user-mode but lack of NVMe and NIC polling drivers takes away all the fun: to move the same amount of data we normally use ~4x more CPU horsepower on "Pure Windows" Vs. "Linux-SPDK-VM-on-Windows" models. Microsoft is trying to bring SPDK to Windows kernel (so does VMware from what I know), but it needs a lot of work from NIC and NVMe engineers and... nobody wants to contribute. Really.Just my $0.02  



