Examples of proper utilization of SAN

scottalanmiller

@Pete-S said in Examples of proper utilization of SAN:

So it's implied that if we are talking about SAN we are talking about shared block storage - meaning local storage is out.

Local storage is the best way to share block storage. In no way whatsoever does needing shared block imply that a SAN is a need.

https://smbitjournal.com/2013/07/replicated-local-storage/

Much of the worst runs on shared local. Some do it by using block protocols like a vSAN, some do it without through things like Gluster or SCRIBE. But RLS is the best way to get high performance shared block, if you need shared.

scottalanmiller

@Pete-S said in Examples of proper utilization of SAN:

@DustinB3403 said in Examples of proper utilization of SAN:

The only reasonable use case for SAN is with massive scale out storage requirements.

Wrong. Low latency shared block storage for OLTP applications don't have to be massive to make sense. Just need high performance requirements. Also, for instance a HPC cluster might fit in one rack but need a high performance storage solution.

As Travis had pointed out, SAN guarantees more latency, not less. So any low latency requirements make SAN less desirable, not more. Your belief that SAN provides more performance than local storage does is causing you to think SAN would solve problems where it doesn't (or doesn't as well as more obvious solutions.)

This is basic computing physics.... the same storage local or distant has to be faster when local. Maybe not much faster, but it can't be slower or the same. There is just less latency - less wire, fewer hops.

Using SAN implies only two things.... distant, and block. Anything else assumed about SAN is just incorrect, it's not part of SAN.

How most people approach it is that they assume their SAN is crazy expensive and their local is cheap and then use that to show that SAN is "faster" by comparing apples and oranges. But the same NVMe drive local vs. hooked to a separate server and shared over even NVMeoF is a tiny bit slower.

1337

@scottalanmiller said in Examples of proper utilization of SAN:

@Pete-S said in Examples of proper utilization of SAN:

@DustinB3403 said in Examples of proper utilization of SAN:

The only reasonable use case for SAN is with massive scale out storage requirements.

Wrong. Low latency shared block storage for OLTP applications don't have to be massive to make sense. Just need high performance requirements. Also, for instance a HPC cluster might fit in one rack but need a high performance storage solution.

As Travis had pointed out, SAN guarantees more latency, not less. So any low latency requirements make SAN less desirable, not more. Your belief that SAN provides more performance than local storage does is causing you to think SAN would solve problems where it doesn't (or doesn't as well as more obvious solutions.)

This is basic computing physics.... the same storage local or distant has to be faster when local. Maybe not much faster, but it can't be slower or the same. There is just less latency - less wire, fewer hops.

Using SAN implies only two things.... distant, and block. Anything else assumed about SAN is just incorrect, it's not part of SAN.

How most people approach it is that they assume their SAN is crazy expensive and their local is cheap and then use that to show that SAN is "faster" by comparing apples and oranges. But the same NVMe drive local vs. hooked to a separate server and shared over even NVMeoF is a tiny bit slower.

You assumed I made assumptions I didn't make.

Yes, local is always faster but local is not shared. So then it all becomes just a question how we share and access the data. If we put the shared storage on dedicated servers we have a SAN. If we put the storage on the same servers that we are running compute we have hyperconverged storage.

In the first case we can optimize both hardware and software and it's only running this single task. In the second case we usually run both compute and storage on the same hardware. By pure logic the first case, the SAN, has to be the higher performing option.

Looking at replicated local storage though, that implies that we can fit the storage on one server. Of course this is almost as fast as local storage (assuming synchronous replication). But it also means that SAN advantage of consolidating storage is lost.

So in order of performance on equal hardware we have:

local storage
replicated local storage
SAN
vSAN and similar

Of course then we have local storage cache for vSAN solutions and other things to mess this up. Also of course in real life its the cost of it all that determines what is the best solution.

When I said that the SAN is the low latency option for OLTP or HPC, it's compared to things like gluster or vSAN - as they are comparable when it comes to storage consolidation. But you need enough workloads and servers for it to make sense to consolidate. Consolidate increase utilization (lower cost) by sacrificing some performance and increasing complexity.

scottalanmiller

@Pete-S said in Examples of proper utilization of SAN:

Yes, local is always faster but local is not shared

But local CAN be shared. SAN is not shared either, but CAN be shared. Both are both shared or not shared.

scottalanmiller

@Pete-S said in Examples of proper utilization of SAN:

In the first case we can optimize both hardware and software and it's only running this single task. In the second case we usually run both compute and storage on the same hardware. By pure logic the first case, the SAN, has to be the higher performing option.

That is in no way logical. That actually is both incredibly unlikely due to logic, and totally not true in the real world. I have no idea what kind of logic would make you think that making them far away from each other would be fast and local slow based on "dedicated harware" when the resource needs of storage is so tiny that it's of no consequence in modern devices.

1337

@scottalanmiller said in Examples of proper utilization of SAN:

@Pete-S said in Examples of proper utilization of SAN:

Yes, local is always faster but local is not shared

But local CAN be shared. SAN is not shared either, but CAN be shared. Both are both shared or not shared.

Local or not local has to be a question of where the data is used. It's always local somewhere.

scottalanmiller

@Pete-S said in Examples of proper utilization of SAN:

Looking at replicated local storage though, that implies that we can fit the storage on one server. Of course this is almost as fast as local storage (assuming synchronous replication). But it also means that SAN advantage of consolidating storage is lost.

RLS does imply that, yes. But SAN does, too. SAN and RLS both have the "fit it in one server" limitation.

In almost all cases, you need SAN to replicate, too. So any replication overhead in 99.9% of cases is the same RLS vs SAN. If your SAN doesn't need to be replicated, then chances are your local storage does not. There are extreme cases where you need shared storage that isn't replicated for reliability where SAN has a consolidation advantage for lower criticality workloads.

scottalanmiller

@Pete-S said in Examples of proper utilization of SAN:

@scottalanmiller said in Examples of proper utilization of SAN:

@Pete-S said in Examples of proper utilization of SAN:

Yes, local is always faster but local is not shared

But local CAN be shared. SAN is not shared either, but CAN be shared. Both are both shared or not shared.

Local or not local has to be a question of where the data is used. It's always local somewhere.

Yes, local to the computer or local to a remote dedicated storage server.

RLS means that the data is LOCAL to multiple locations.

scottalanmiller

@Pete-S said in Examples of proper utilization of SAN:

So in order of performance on equal hardware we have:

local storage
replicated local storage
SAN
vSAN and similar

I would not break it down that way. RLS can be as fast as any other local storage, if you don't require full sync. Async is an option and can have no performance overhead.

vSAN is not slower than SAN. A SAN and vSAN are the same speed. And in the real world, since vSAN options are more flexible, they are actually faster.

scottalanmiller

@Pete-S said in Examples of proper utilization of SAN:

When I said that the SAN is the low latency option for OLTP or HPC, it's compared to things like gluster or vSAN - as they are comparable when it comes to storage consolidation

I see. The problem with that is that Gluster is specifically a super slow mechanism. It's not that it is RLS that makes it slow, it's the Gluster mechanism itself. It's just a solution not designed around speed. So yes, it's slow. But the Netgear SC101 SAN is way slower, even though it is a real SAN on dedicated hardware.

If we look at the faster SAN and RLS options, like a top end EMC vs. a Starwind vSAN then we are at blinding speeds in both cases with, I believe, the Starwind pulling ahead with things like NVMeoF that is pretty much as fast as things get, and RAM cache replication over RDMA on 100Mb/s Infiniband.

1337

@scottalanmiller said in Examples of proper utilization of SAN:

@Pete-S said in Examples of proper utilization of SAN:

When I said that the SAN is the low latency option for OLTP or HPC, it's compared to things like gluster or vSAN - as they are comparable when it comes to storage consolidation

I see. The problem with that is that Gluster is specifically a super slow mechanism. It's not that it is RLS that makes it slow, it's the Gluster mechanism itself. It's just a solution not designed around speed. So yes, it's slow. But the Netgear SC101 SAN is way slower, even though it is a real SAN on dedicated hardware.

If we look at the faster SAN and RLS options, like a top end EMC vs. a Starwind vSAN then we are at blinding speeds in both cases with, I believe, the Starwind pulling ahead with things like NVMeoF that is pretty much as fast as things get, and RAM cache replication over RDMA on 100Mb/s Infiniband.

But look at next-gen SANs like Pure Storage and Netapp's NVMe arrays also running NVMe over Fabric. I think they are well ahead.

I don't know how Starwind vSAN can be run but if it's on a hypervisor it's severely limited by I/O congestion through the kernel. NVMe drives is causing problems that was of no concern whatsoever with spinners. Both KVM and Xen has made a lot of work to limit their I/O latency and use polling techniques now but it's still a problem. That's why you really need SR-IOV on NVMe drives so any VM can bypass the hypervisor and just have it's own kernel to slow things down.

1337

@Pete-S said in Examples of proper utilization of SAN:

@scottalanmiller said in Examples of proper utilization of SAN:

@Pete-S said in Examples of proper utilization of SAN:

When I said that the SAN is the low latency option for OLTP or HPC, it's compared to things like gluster or vSAN - as they are comparable when it comes to storage consolidation

I see. The problem with that is that Gluster is specifically a super slow mechanism. It's not that it is RLS that makes it slow, it's the Gluster mechanism itself. It's just a solution not designed around speed. So yes, it's slow. But the Netgear SC101 SAN is way slower, even though it is a real SAN on dedicated hardware.

If we look at the faster SAN and RLS options, like a top end EMC vs. a Starwind vSAN then we are at blinding speeds in both cases with, I believe, the Starwind pulling ahead with things like NVMeoF that is pretty much as fast as things get, and RAM cache replication over RDMA on 100Mb/s Infiniband.

But look at next-gen SANs like Pure Storage and Netapp's NVMe arrays also running NVMe over Fabric. I think they are well ahead.

I don't know how Starwind vSAN can be run but if it's on a hypervisor it's severely limited by I/O congestion through the kernel. NVMe drives is causing problems that was of no concern whatsoever with spinners. Both KVM and Xen has made a lot of work to limit their I/O latency and use polling techniques now but it's still a problem. That's why you really need SR-IOV on NVMe drives so any VM can bypass the hypervisor and just have it's own kernel to slow things down.

Anyway, I'm sure every vendor is working on making things faster all the time. There are just to many option to make blanket statements on anything without being specific. For any discussion like this you'd have to have some idea of exactly what you want to accomplish and what the budget is to determine if option A or option B is the best course of action.

EddieJennings

@scottalanmiller said in Examples of proper utilization of SAN:

Just Google: When to Consider a SAN

A voila, first hit.

Reading that is what prompted me to make this thread. I am looking for examples of $enterprises running $applications which requires infrastructure that would necessitate the use of SAN.

To efficiently leverage consolidation it is necessary to have scale and this is where SANs really shine – when scale but in capacity and, more importantly, in the number of attaching nodes become very large. SANs are best suited to large scale storage consolidation. This is their sweet spot and what makes them nearly ubiquitous in large enterprises and very rare in small ones.

I'm trying to of examples of where this situation exists. Would this be a valid example? Take Vultr, who's providing VPS services. Because of the number of hosts hosting the VMs which are eventually used by customers for their various projects, the best way to present block storage to these hosts is from a SAN.

EddieJennings

@DustinB3403 said in Examples of proper utilization of SAN:

@EddieJennings what conversation is going on that you're looking for more information regarding SAN (products I assume). Which SAN isn't something you can purchase, it's something you have to build.

Been busy this morning, and haven't been able to follow the thread closely. I know that SAN is something that's built, and what I was looking for are real-world examples of infrastructures that utilize SAN and see what about their needs leads to the decision that building a SAN is necessary.

Likely there isn't any kind of clear example of the process that leads to said decision, but at least it got folks talking, so I can go back in a bit, read what's there, and see if there's wisdom to gain :).

Obsolesce

@EddieJennings said in Examples of proper utilization of SAN:

@DustinB3403 said in Examples of proper utilization of SAN:

@EddieJennings what conversation is going on that you're looking for more information regarding SAN (products I assume). Which SAN isn't something you can purchase, it's something you have to build.

Been busy this morning, and haven't been able to follow the thread closely. I know that SAN is something that's built, and what I was looking for are real-world examples of infrastructures that utilize SAN and see what about their needs leads to the decision that building a SAN is necessary.

Likely there isn't any kind of clear example of the process that leads to said decision, but at least it got folks talking, so I can go back in a bit, read what's there, and see if there's wisdom to gain :).

Good luck wading through all the straw-man arguments and examples lol.

KOOLER

I don't know how Starwind vSAN can be run but if it's on a hypervisor it's severely limited by I/O congestion through the kernel. NVMe drives is causing problems that was of no concern whatsoever with spinners. Both KVM and Xen has made a lot of work to limit their I/O latency and use polling techniques now but it's still a problem. That's why you really need SR-IOV on NVMe drives so any VM can bypass the hypervisor and just have it's own kernel to slow things down.

Anton: There are no problems with polling these days You normally spawn a SPDK-enabled VM (Linux is unbeatable here as most of the new gen I/O development happens there) and pass thru RDMA-capable network hardware (virtual function with SR-IOV or whole card with PCIe pass-thru, this is really irrelevant...) and NMVe drives and... magic starts happening This is how our NVMe-oF target works on ESXi & Hyper-V (KVM & Xen have no benefits here architecturally, this is where you're either wrong or I failed to get your arguments). It's possible to port SPDK into Windows user-mode but lack of NVMe and NIC polling drivers takes away all the fun: to move the same amount of data we normally use ~4x more CPU horsepower on "Pure Windows" Vs. "Linux-SPDK-VM-on-Windows" models. Microsoft is trying to bring SPDK to Windows kernel (so does VMware from what I know), but it needs a lot of work from NIC and NVMe engineers and... nobody wants to contribute. Really.

Just my $0.02