StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far)

Dashrender

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

The reason software RAID outperforms hardware these days is much simpler - hardware raid asics never got as much investment and boosting as regular CPUs, so what we have is modern massive CPUs vs RAID controllers that haven't seem much progress since the late 90s. And since nobody cares enough to invest in them or make them cheaper, they simply die out, which is well and proper.

I've been wondering about this very point. Clearly the CPUs in systems have gotten better and better - hell, we know because of crypto mining that ASICS are getting better and better (job specific). So why is hardware RAID slower than software?
The only thing I can come up with is trace length latency in the system. Assuming the storage is local in both cases, I would expect a modern, currently developed RAID ASIC would match or trash a CPU doing the same task - the difference then being that the RAID controller has to then hand the data off to the RAM and CPU for actual processing - so there 'might' be a step saving by having the CPU doing it all.

@scottalanmiller ?

dyasny

@Dashrender said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

I've been wondering about this very point. Clearly the CPUs in systems have gotten better and better - hell, we know because of crypto mining that ASICS are getting better and better (job specific). So why is hardware RAID slower than software?

Because these ASICs aren't priority - mining ASICs and speed trading ASICs make money, it's a worthwhile investment. A RAID controller ASIC does a job and sells a controller for $200 once, with the customer grumbling about being able to do it all in software for free anyway.

The only thing I can come up with is trace length latency in the system. Assuming the storage is local in both cases, I would expect a modern, currently developed RAID ASIC would match or trash a CPU doing the same task - the difference then being that the RAID controller has to then hand the data off to the RAM and CPU for actual processing - so there 'might' be a step saving by having the CPU doing it all.

Not really. Depending on the RAID, there are few things to do - mirror writes and balance reads for raid1(+N), and calculating parity for striped arrays. None of this is very specific and would be much better in a separate ASIC, given a powerful enough generic CPU. The operations are in any case happening under the driver level, transparently for the IO issuing layer.

scottalanmiller

@FATeknollogee said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@FATeknollogee said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@FATeknollogee said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Curious question...what happened to Starwind vSAN for Linux (KVM), is that not a thing anymore?

It is for sure, they talked about it at MangoCon

OK..wonder how Starwind HCA/vSAN compares to VMware vSAN!

Only requires two nodes, is available for free, has some really breakthrough tech, is cross platform, Network RAID vs RAIN, etc.

Forgetting the number of nodes (for a minute), are you saying it performs better than VMware's vSAN?

It performs better than anyone. It's insanely fast.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

It is yet anothe niche approach to doing specific things and not the solution to everything under the sun, like the people pushing it claim

Actually, it basically is. Because HCI is essentially just "logical design". It's not some magic, it's just the obvious, logical way to build systems of any scale. One can easily show that every stand alone server is HCI, too. Basically HCI encompasses everything that isn't IPOD or just overbuild SAN infrastructure which has a place, but is incredibly niche.

HCI is the only logical approach to 95% of the world's workloads. Just loads and loads of people either get by with terrible systems, or use HCI and don't realize it.

But the real issue is that HCI alternatives come with massive caveats and have only niche use cases that make sense.

scottalanmiller

@Dashrender said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

So why is hardware RAID slower than software?

Because:

It's an insanely low needs function so there is no benefit to investing there. There is essentially "no work" being done.
It's extremely basic IO, not something that an ASIC can do better than a CPU that is already designed for exactly that task.
The spare overhead of the CPU is so much that there is no cost effective way to duplicate the power.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Because these ASICs aren't priority - mining ASICs and speed trading ASICs make money, it's a worthwhile investment. A RAID controller ASIC does a job and sells a controller for $200 once, with the customer grumbling about being able to do it all in software for free anyway.

And good controllers are $600+ and at that price can't compete with the software in performance. Mining or graphics use ASICs or GPUs for very special case math making the special hardware valuable. RAID doesn't do special math, it does basic math and mostly just IO. So the reasons that ASICs are good for mining don't exist with RAID, at all.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Databases don't (or rather shouldn't) need storage replication in 2019. There are plenty of native tools for that, which are safer, cheaper and more efficient.

Absolutely. So having the storage be local, not remote, carries the real benefits. HCI doesn't imply replication any more than SAN does. Most do, of course, and if you want FT that's generally how you do it.

So databases, when done correct, generally make the most sense on stand alone boxes with local storage - a one node HCI setup.

For databases that do need the platform, rather than the application, to handle HA or FT, then HCI with more than one node is the best option.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

The reason software RAID outperforms hardware these days is much simpler - hardware raid asics never got as much investment and boosting as regular CPUs, so what we have is modern massive CPUs vs RAID controllers that haven't seem much progress since the late 90s. And since nobody cares enough to invest in them or make them cheaper, they simply die out, which is well and proper.

Exactly, there is really no benefit to anyone to make hardware RAID faster. The cost would be enormous, the benefits nominal. It's just not important. Even if you had gobs of money to throw at it, you can't get it enough faster to ever justify. If you need something that fast, you pretty much can't be on RAID anyway. You'd be spending hundreds of thousands to get essentially immeasurable performance when for cheaper you could blow it away with some high performance NVMe setup that doesn't use RAID at all.

So while, in theory, hardware RAID could be built at some crazy cost to be faster, it can't be in practical terms. And anything that you did do would waste money that could have been used to make the overall system faster in some way.

Bottom line... RAID performance itself is a nearly worthless pursuit. The different between RAID 6 and RAID 10 might be big, but the difference between software RAID 10 and hardware RAID 10 and MD and ZFS and Adaptec and LSI is all "background noise."

dyasny

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Actually, it basically is. Because HCI is essentially just "logical design". It's not some magic, it's just the obvious, logical way to build systems of any scale. One can easily show that every stand alone server is HCI, too. Basically HCI encompasses everything that isn't IPOD or just overbuild SAN infrastructure which has a place, but is incredibly niche.

HCI is the only logical approach to 95% of the world's workloads. Just loads and loads of people either get by with terrible systems, or use HCI and don't realize it.

But the real issue is that HCI alternatives come with massive caveats and have only niche use cases that make sense.

Thanks for proving my point When all you have is a hammer, everything starts looking like a nail, eh?

Absolutely. So having the storage be local, not remote, carries the real benefits. HCI doesn't imply replication any more than SAN does. Most do, of course, and if you want FT that's generally how you do it.

Now you are confusing basic local storage with HCI. If I install a bunch of ESXi servers using their local disks, with local-only VMs, am I running an HCI setup?

For databases that do need the platform, rather than the application, to handle HA or FT, then HCI with more than one node is the best option.

No, for those, it definitely makes more sense to use an addon that enables replication, sharding and other horizontal scaling techniques.

DustinB3403

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Now you are confusing basic local storage with HCI. If I install a bunch of ESXi servers using their local disks, with local-only VMs, am I running an HCI setup?

If you install any hypervisor onto a single server with compute, storage and network, that is hyperconverged. Everything is contained in 1 physical box.

HCI is everything is contained in 1 big virtual box, with a bunch of individual physical boxes providing resources, that can run a portion of the entire workload and that get put into that virtual box.

So no, installing ESXi on a bunch of individual servers and having nothing "box them together" is not HCI. You'd need to use ESXi's vSAN or hyperconverged product.

DustinB3403

Hell your desktop or laptop is hyperconverged.

Everything is self contained.

DustinB3403

And the ESXi vSAN product is the tool that ESXi promotes, but it requires at least 3 physical boxes (ideally) but they'll let it slide if you only have 2 servers and a single VM to act as a quorum.

dyasny

@DustinB3403 said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Hell your desktop or laptop is hyperconverged.

Everything is self contained.

Yup, this is all just marketing hype. In the real world, a standalone host is just a standalone host, it was before HCI was a thing and will be after.
Also note, I always use the term HCI, not just HC, and I always mean it to be exactly what it is being sold as - a way of building virtualized infrastructure so that the shared storage in use, is provided by the same machines that host the workloads, off of their internal drives. I could get into the networking aspect of things, but that will only make my point stronger - mixing everything on a single host is a bad idea.

DustinB3403

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

so that the shared storage in use

HCI isn't just shared storage. It's shared everything.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

In the real world, a standalone host is just a standalone host, it was before HCI was a thing and will be after.

HC was always a thing, though, that's the thing. That it got buzz is different. We've had HC all along, just people didn't call it anything.

DustinB3403

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

mixing everything on a single host is a bad idea.

What do you mean, mixing everything? The magic sauce is what makes tools like Starwinds vSAN an amazing tool. It works with the hypervisor to manage all of your hosts from a single interface. Should any host go down, those resources are offline, but the VM's that may have been on there are moved to the remaining members of the HCI environment (of multiple physical hosts).

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

mixing everything on a single host is a bad idea.

No, it's separating it that is the bad idea. Separate means less performance and more points of failure. It's just like hardware and software RAID... when tech is new you need unique hardware to offload it, over time, that goes away. This has happened, at this point, with the whole stack. And did long ago, there was just so much money is gouging people with SANs that every vendor clung to that as long as they could.

But putting those workloads outside of the server make it slower, costlier, and riskier. There's really no benefits.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Also note, I always use the term HCI, not just HC, and I always mean it to be exactly what it is being sold as - a way of building virtualized infrastructure so that the shared storage in use, is provided by the same machines that host the workloads, off of their internal drives.

That's fine, but that's not HC or HCI. That's one vendor's product of it (or several.) HC is not the property of a vendor, it's an architecture, and an old one that has battle tested and logically is the only primary way to build systems.

DustinB3403

The easiest way I can think to explain your rational @dyasny is to pretend I'm building a server, but because I don't trust the RAID controller that I can purchase for my MB, I purchase a bunch of external disks, plug those into another MB and then attach that storage back to my server via iSCSI over the network.

How is this safer, more reliable and cheaper than just adding all of the physical resources into a single server? Then combining 2, 3 or however many of the identical servers together with some magic sauce and managing it from a single interface?

dyasny

@DustinB3403 said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

HCI isn't just shared storage. It's shared everything.

Great, so we are also running the SDN controllers on all the hosts. Even an OVN controller is a huge resource hog. A Neutron controller in Openstack is even worse. And then the big boys come in, have you tried to build an Arista setup?

I am not talking theory here, I'm talking implementation, as someone who built datacenters and both public and private clouds at scale. Running the entire stack on each host, along with the actual workload is a horrible idea.

What do you mean, mixing everything? The magic sauce is what makes tools like Starwinds vSAN an amazing tool.

Sounds like marketing bs to me, sorry Magic sauce? Really?

It works with the hypervisor to manage all of your hosts from a single interface. Should any host go down, those resources are offline, but the VM's that may have been on there are moved to the remaining members of the HCI environment (of multiple physical hosts).

Sounds like any decently built virtualized DC solution, from proxmox to ovirt to vcenter and xenserver. How is it "magic" exactly?

The easiest way I can think to explain your rational @dyasny is to pretend I'm building a server, but because I don't trust the RAID controller that I can purchase for my MB, I purchase a bunch of external disks, plug those into another MB and then attach that storage back to my server via iSCSI over the network.

This is a ridiculous example. What you describe is instead of having a server with a disk controller, disks , GPU and NICs, I'd install a single card that is a NIC, a GPU and can store data. So that instead of the PCI bus accessing each controller separately with better bandwidth, all the IO and different workloads are driven through a single PCI bus channel. And then use "magic" to install several of those hybrid monster cards in the hopes of making them work better.

How is this safer, more reliable and cheaper than just adding all of the physical resources into a single server? Then combining 2, 3 or however many of the identical servers together with some magic sauce and managing it from a single interface?

There you go with the magic sauce koolaid again.