StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far)

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

The only viable use case is having those VMs accessing blocks stored locally

Not the only one, but the assumption is that any HC system worth its salt does this essentially all of the time. It's not technically a requirement for being HC, but it would be downright idiotic for it to do anything else (except for in a failover state.) The problem with HC alternatives is that they all do the thing that would be idiotic for HC to do as their only option.

FATeknollogee

Curious question...what happened to Starwind vSAN for Linux (KVM), is that not a thing anymore?

scottalanmiller

@FATeknollogee said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Curious question...what happened to Starwind vSAN for Linux (KVM), is that not a thing anymore?

It is for sure, they talked about it at MangoCon

FATeknollogee

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@FATeknollogee said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Curious question...what happened to Starwind vSAN for Linux (KVM), is that not a thing anymore?

It is for sure, they talked about it at MangoCon

OK..wonder how Starwind HCA/vSAN compares to VMware vSAN!

scottalanmiller

@FATeknollogee said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@FATeknollogee said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Curious question...what happened to Starwind vSAN for Linux (KVM), is that not a thing anymore?

It is for sure, they talked about it at MangoCon

OK..wonder how Starwind HCA/vSAN compares to VMware vSAN!

Only requires two nodes, is available for free, has some really breakthrough tech, is cross platform, Network RAID vs RAIN, etc.

FATeknollogee

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@FATeknollogee said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@FATeknollogee said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Curious question...what happened to Starwind vSAN for Linux (KVM), is that not a thing anymore?

It is for sure, they talked about it at MangoCon

OK..wonder how Starwind HCA/vSAN compares to VMware vSAN!

Only requires two nodes, is available for free, has some really breakthrough tech, is cross platform, Network RAID vs RAIN, etc.

Forgetting the number of nodes (for a minute), are you saying it performs better than VMware's vSAN?

dyasny

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

That's not why. They are flunking because their product sucks, doesn't compete with anyone else in the market, is on shaky legal ground, and they threaten their customers and are an evil organization that threatens lawsuits against anyone that exposes them.

Yeah, I am quite aware of how bad they are as a company, but their marketing is basically "we are HCI" (just like Mirantis used to be "we are Openstack") and HCI is, well, not there. It is yet anothe niche approach to doing specific things and not the solution to everything under the sun, like the people pushing it claim.

HC is without a doubt the best thing out there, the concept is straightforward and obvious and nothing comes close, it's just the market has calmed down and people care about products, not marketing hype.

No, it is just another thing and nothing more. In a perfect world, all applications would be distributed and SANs or HCI would not be required, so all we'd have is a bunch of servers with local storage, running local workloads that are able to multi-master and replicate across those hosts nicely. This is the ideal workload for all the modern stuff, managed by k8s/dcos/mesos/swarm/etc. For everything else, in some cases you are much better off running on a massive SAN or a distributed SDS, and in some you can benefit from using replicated local storage, however, replicated local storage will always consume resources that will be taken away from the actual workloads. It will also add a lot of complexity to the overall system, after all, if a host goes down you get both a migration/restart storm and a storage rebalancing storm at the same time, and hitting the same blocks of data and machines using them.

If you choose to swear by HCI and see it as the one and only solution for everything, you either can only see a very narrow set of tasks for infrastructure, which fit your world view, or you are taking marketing too close to the heart.

The workloads are almost entirely separate, so having them separate actually requires more work from both components, plus introduces latency

Thanks, that's another problem with HCI, I agree.

Storage doesn't eat up RAM or CPU

Lets take a look at CPU and RAM requirements for, say, ZFS? How many cores per node, and RAM per core is required? All that for a local dumb server, before we even start dealing with replication, self healing, all the madness behind RAFT etc.

Think about a normal server, the reason that software RAID outperforms hardware RAID is because the overhead of RAID got so tiny that extra hardware for it made things slower, not faster, and that was by 2000 with the Pentium IIIS processor. Today the system performance and overhead take that to many orders of magnitude higher.

The reason software RAID outperforms hardware these days is much simpler - hardware raid asics never got as much investment and boosting as regular CPUs, so what we have is modern massive CPUs vs RAID controllers that haven't seem much progress since the late 90s. And since nobody cares enough to invest in them or make them cheaper, they simply die out, which is well and proper.

Virtualization (and containers too) came about because servers were getting to big for a single workload and people wanted to actually utilize their hardware better. Which led to massive workloads running on single machines, maxing them out. This is still the case, a normal hypervisor will easily see 100% utilization, will have to do all the usual resource sharing tricks and juggle tasks that come from the VMs and containers competing for CPU time and RAM pages. And now you come in and dump yet another massive workload an that same machine, and tell me it will have no impact? Don't be ridiculous.

Here are some figures for commonly used distributed storage:
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/red_hat_ceph_storage_hardware_selection_guide/recommended-minimum-hardware-requirements-for-the-red-hat-ceph-storage-dashboard-hardware
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/quick_start_guide/system_requirements
https://storpool.com/system-requirements

Works for all workloads, VDI is one of the worst cases for HC. Still shows how HC is better every time, but it's where it is "better the least." VDI has almost no storage dependency, whereas something like a database has a huge storage dependency.

VDI using a pool of stateless desktop VMs temporarily snapshotted from a single base image is the perfect use case for HCI. If you have the base image replicated across the cluster, all the VMs will be doing their reads locally.

It's a database that shows where HC isn't just cheaper and safer, but way faster and reduces total hardware needs.

Databases don't (or rather shouldn't) need storage replication in 2019. There are plenty of native tools for that, which are safer, cheaper and more efficient.

Not the only one, but the assumption is that any HC system worth its salt does this essentially all of the time. It's not technically a requirement for being HC, but it would be downright idiotic for it to do anything else (except for in a failover state.) The problem with HC alternatives is that they all do the thing that would be idiotic for HC to do as their only option.

Not the only one, but it's an obvious example. In any case, there is plenty of tech out there that makes network latency a non-issue, if need be, and the added complexity and risk of HCI is usually not worth it.

Dashrender

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

The reason software RAID outperforms hardware these days is much simpler - hardware raid asics never got as much investment and boosting as regular CPUs, so what we have is modern massive CPUs vs RAID controllers that haven't seem much progress since the late 90s. And since nobody cares enough to invest in them or make them cheaper, they simply die out, which is well and proper.

I've been wondering about this very point. Clearly the CPUs in systems have gotten better and better - hell, we know because of crypto mining that ASICS are getting better and better (job specific). So why is hardware RAID slower than software?
The only thing I can come up with is trace length latency in the system. Assuming the storage is local in both cases, I would expect a modern, currently developed RAID ASIC would match or trash a CPU doing the same task - the difference then being that the RAID controller has to then hand the data off to the RAM and CPU for actual processing - so there 'might' be a step saving by having the CPU doing it all.

@scottalanmiller ?

dyasny

@Dashrender said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

I've been wondering about this very point. Clearly the CPUs in systems have gotten better and better - hell, we know because of crypto mining that ASICS are getting better and better (job specific). So why is hardware RAID slower than software?

Because these ASICs aren't priority - mining ASICs and speed trading ASICs make money, it's a worthwhile investment. A RAID controller ASIC does a job and sells a controller for $200 once, with the customer grumbling about being able to do it all in software for free anyway.

The only thing I can come up with is trace length latency in the system. Assuming the storage is local in both cases, I would expect a modern, currently developed RAID ASIC would match or trash a CPU doing the same task - the difference then being that the RAID controller has to then hand the data off to the RAM and CPU for actual processing - so there 'might' be a step saving by having the CPU doing it all.

Not really. Depending on the RAID, there are few things to do - mirror writes and balance reads for raid1(+N), and calculating parity for striped arrays. None of this is very specific and would be much better in a separate ASIC, given a powerful enough generic CPU. The operations are in any case happening under the driver level, transparently for the IO issuing layer.

scottalanmiller

@FATeknollogee said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@FATeknollogee said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@FATeknollogee said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Curious question...what happened to Starwind vSAN for Linux (KVM), is that not a thing anymore?

It is for sure, they talked about it at MangoCon

OK..wonder how Starwind HCA/vSAN compares to VMware vSAN!

Only requires two nodes, is available for free, has some really breakthrough tech, is cross platform, Network RAID vs RAIN, etc.

Forgetting the number of nodes (for a minute), are you saying it performs better than VMware's vSAN?

It performs better than anyone. It's insanely fast.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

It is yet anothe niche approach to doing specific things and not the solution to everything under the sun, like the people pushing it claim

Actually, it basically is. Because HCI is essentially just "logical design". It's not some magic, it's just the obvious, logical way to build systems of any scale. One can easily show that every stand alone server is HCI, too. Basically HCI encompasses everything that isn't IPOD or just overbuild SAN infrastructure which has a place, but is incredibly niche.

HCI is the only logical approach to 95% of the world's workloads. Just loads and loads of people either get by with terrible systems, or use HCI and don't realize it.

But the real issue is that HCI alternatives come with massive caveats and have only niche use cases that make sense.

scottalanmiller

@Dashrender said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

So why is hardware RAID slower than software?

Because:

It's an insanely low needs function so there is no benefit to investing there. There is essentially "no work" being done.
It's extremely basic IO, not something that an ASIC can do better than a CPU that is already designed for exactly that task.
The spare overhead of the CPU is so much that there is no cost effective way to duplicate the power.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Because these ASICs aren't priority - mining ASICs and speed trading ASICs make money, it's a worthwhile investment. A RAID controller ASIC does a job and sells a controller for $200 once, with the customer grumbling about being able to do it all in software for free anyway.

And good controllers are $600+ and at that price can't compete with the software in performance. Mining or graphics use ASICs or GPUs for very special case math making the special hardware valuable. RAID doesn't do special math, it does basic math and mostly just IO. So the reasons that ASICs are good for mining don't exist with RAID, at all.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Databases don't (or rather shouldn't) need storage replication in 2019. There are plenty of native tools for that, which are safer, cheaper and more efficient.

Absolutely. So having the storage be local, not remote, carries the real benefits. HCI doesn't imply replication any more than SAN does. Most do, of course, and if you want FT that's generally how you do it.

So databases, when done correct, generally make the most sense on stand alone boxes with local storage - a one node HCI setup.

For databases that do need the platform, rather than the application, to handle HA or FT, then HCI with more than one node is the best option.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

The reason software RAID outperforms hardware these days is much simpler - hardware raid asics never got as much investment and boosting as regular CPUs, so what we have is modern massive CPUs vs RAID controllers that haven't seem much progress since the late 90s. And since nobody cares enough to invest in them or make them cheaper, they simply die out, which is well and proper.

Exactly, there is really no benefit to anyone to make hardware RAID faster. The cost would be enormous, the benefits nominal. It's just not important. Even if you had gobs of money to throw at it, you can't get it enough faster to ever justify. If you need something that fast, you pretty much can't be on RAID anyway. You'd be spending hundreds of thousands to get essentially immeasurable performance when for cheaper you could blow it away with some high performance NVMe setup that doesn't use RAID at all.

So while, in theory, hardware RAID could be built at some crazy cost to be faster, it can't be in practical terms. And anything that you did do would waste money that could have been used to make the overall system faster in some way.

Bottom line... RAID performance itself is a nearly worthless pursuit. The different between RAID 6 and RAID 10 might be big, but the difference between software RAID 10 and hardware RAID 10 and MD and ZFS and Adaptec and LSI is all "background noise."

dyasny

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Actually, it basically is. Because HCI is essentially just "logical design". It's not some magic, it's just the obvious, logical way to build systems of any scale. One can easily show that every stand alone server is HCI, too. Basically HCI encompasses everything that isn't IPOD or just overbuild SAN infrastructure which has a place, but is incredibly niche.

HCI is the only logical approach to 95% of the world's workloads. Just loads and loads of people either get by with terrible systems, or use HCI and don't realize it.

But the real issue is that HCI alternatives come with massive caveats and have only niche use cases that make sense.

Thanks for proving my point When all you have is a hammer, everything starts looking like a nail, eh?

Absolutely. So having the storage be local, not remote, carries the real benefits. HCI doesn't imply replication any more than SAN does. Most do, of course, and if you want FT that's generally how you do it.

Now you are confusing basic local storage with HCI. If I install a bunch of ESXi servers using their local disks, with local-only VMs, am I running an HCI setup?

For databases that do need the platform, rather than the application, to handle HA or FT, then HCI with more than one node is the best option.

No, for those, it definitely makes more sense to use an addon that enables replication, sharding and other horizontal scaling techniques.

DustinB3403

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Now you are confusing basic local storage with HCI. If I install a bunch of ESXi servers using their local disks, with local-only VMs, am I running an HCI setup?

If you install any hypervisor onto a single server with compute, storage and network, that is hyperconverged. Everything is contained in 1 physical box.

HCI is everything is contained in 1 big virtual box, with a bunch of individual physical boxes providing resources, that can run a portion of the entire workload and that get put into that virtual box.

So no, installing ESXi on a bunch of individual servers and having nothing "box them together" is not HCI. You'd need to use ESXi's vSAN or hyperconverged product.

DustinB3403

Hell your desktop or laptop is hyperconverged.

Everything is self contained.

DustinB3403

And the ESXi vSAN product is the tool that ESXi promotes, but it requires at least 3 physical boxes (ideally) but they'll let it slide if you only have 2 servers and a single VM to act as a quorum.

dyasny

@DustinB3403 said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Hell your desktop or laptop is hyperconverged.

Everything is self contained.

Yup, this is all just marketing hype. In the real world, a standalone host is just a standalone host, it was before HCI was a thing and will be after.
Also note, I always use the term HCI, not just HC, and I always mean it to be exactly what it is being sold as - a way of building virtualized infrastructure so that the shared storage in use, is provided by the same machines that host the workloads, off of their internal drives. I could get into the networking aspect of things, but that will only make my point stronger - mixing everything on a single host is a bad idea.