StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far)

dyasny

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

This is they myth. In most HCI it adds no appreciable load. As long as you believe that things like storage and networking are going to create a lot of load, yes, this is going to seem like a point of risk, although even then things like RAID cards fixed that in the era where that was true.

But since it doesn't add load, and actually adds less load than splitting it out, this logic is backwards.

I already answered that above. Just because you say it doesn't add any load, doesn't mean it doesn't.

SDS isn't part of HC. This might be a root of your confusion. This is why some HC, like the one for whom the thread is about doesn't do this and just does RAID. Overhead is ridiculously low.

How exactly do they deal with the HA side of things? With RAID, and a host going down, all the VMs using that host go down, RAID or not.

DustinB3403

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@DustinB3403 said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

That's a joke?

You should have seen my British friend tell it

I don't really get it, but okay.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

even if it's going to be some opensource SDN like Calico and not a suitcase of money sent to cisco, you should dedicate the correctly spec'd hardware to that, the same goes for the storage stack - you want to run on commodity hardware using opensource SDS software - be my guest, but dedicate those hosts to SDS and spec them out to fit the task, and the same goes for the workload-bearing machines,

This just doesn't hold up in the real world. Are there cases where you'd want this? Absolutely. But in general? No. Most companies and workloads are not trying to do things where this makes sense at all. Giant storage pools, on host SDS, etc. These are the things that, while they have their place, is almost exclusively just vendors raping customers who don't realize that this stuff isn't helping them.

All this SDS like this is doing is making new SAN pools, right back to the complexity, costs, and risks that we had before.

If your design creates all this overhead, whether you address it in one box or many, chances are that design itself is the flaw. Not always, but generally. But whether you have RAID or RAIN, the overhead to do this stuff just isn't there when implemented well. Now sure, if we only look at totally garbage solutions, we can make any design seem like a problem. but you have to separate good design from good products. Bad products exist even in well designed solutions.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

I already answered that above. Just because you say it doesn't add any load, doesn't mean it doesn't.

Right, but because we measure it and there isn't means that there isn't. Just because you claim a load that no one has, which is all you are doing, doesn't make it real. You have created problems that no one faces and are acting like we are all impacted by them.

You are literally claiming that contrary to all evidence, common sense, and industry knowledge, that software RAID is not just a huge load, but so large that we now need not only hardware RAID cards to do it, but entire hardware RAID servers!

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

How exactly do they deal with the HA side of things? With RAID, and a host going down, all the VMs using that host go down, RAID or not.

Um, RAID makes a copy. So both machines have identical copies, RAID 1 is just mirroring - they even mirror the RAM cache. There's no magic. I think you are not aware of what HC products are like in the real world and are making loads of assumptions. My guess is you've seen Nutanix, the one god awful total failure of an HC product (that didn't even exist until I was consulting on HC for many years) and assuming that what they do badly or wrong is part of HC when it is just a Nutanix thing.

Many HC products use network RAID, not RAIN, and have none of the issues you are picturing. And many RAIN implementations, like SCRIBE, don't have them either. No one is arguing that Nutanix is bad or has these problems, we are explaining to you that your limited interpretation of HC to mean something different than it means to anyone else, is making it seem like these problems are endemic when, in fact, they aren't even likely. You are basically defining HC to mean Nutanix, when to everyone else, Nutanix isn't even a player, just a complete joke that exists only for marketers to screw the unprepared.

You should really research the field and products before making these wild claims. It's trivial to show that your assumptions can't be true, because we can demonstrate it. No one is denying that bad products are bad, but you are claiming that good products can't be good because you once saw a bad one. That's like thinking all hamburgers are bad because you once ate at a McDonald's, but that's hardly the only hamburger, let alone the reference example.

DustinB3403

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

but entire hardware RAID servers!

Not only Hardware RAID Servers, but separate dedicated network stacks, and compute pools as well.

scottalanmiller

Good reference HC systems would be....

Scale HC3, StarWind, Vmware with VSAN, Simplivity (for hardware offload), and XO. This gives a range of packaged HC solutions that show how HC can't be what you imagine it to be, all are well made and none of the issues you are stuck on.

Then you should look at the more traditional HC examples that we've had for even longer, like KVM and Xen with DRBD or BHive with HASP where you build your own HC and also, don't have these issues.

dyasny

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

This just doesn't hold up in the real world.

HCI doesn't hold up in the real world. See, I can keep doing this too

Most companies and workloads are not trying to do things where this makes sense at all.

Maybe in the SMB space there is less planning and more "lets just deliver something and let someone else support it", but it doesn't work in the enterprise.

All this SDS like this is doing is making new SAN pools, right back to the complexity, costs, and risks that we had before.

Exactly. SDS or SANs - you can pick whatever you want and suits you best. But if you are going to be running a replicated distributed storage service on hardware that is already quite busy running VMs, you'll end up in trouble as soon as you go anywhere near capacity.

If your design creates all this overhead, whether you address it in one box or many, chances are that design itself is the flaw. Not always, but generally. But whether you have RAID or RAIN, the overhead to do this stuff just isn't there when implemented well.

How exactly can there be no overhead, when you are synchronizing large chunks of blocks over the network? Especially with high replication factors? Add encryption to that, add all the extra logic for local tiering, and there's no wonder the minimal sizing for SDS is so high.

Now sure, if we only look at totally garbage solutions, we can make any design seem like a problem. but you have to separate good design from good products. Bad products exist even in well designed solutions.

Right, but because we measure it and there isn't means that there isn't. Just because you claim a load that no one has, which is all you are doing, doesn't make it real. You have created problems that no one faces and are acting like we are all impacted by them.

More sophistry. Can you be more specific please, instead?

You are literally claiming that contrary to all evidence, common sense, and industry knowledge, that software RAID is not just a huge load, but so large that we now need not only hardware RAID cards to do it, but entire hardware RAID servers!

I'm not talking about simple software RAID, I'm talking about keeping a distributed storage system in sync, and then on top of that keep constantly rebalancing to satisfy the local tiering requirements. And while doing all that juggling, also ensure the system remains resilient to node failure. This is a lot of work, unless like @Dashrender says there is magic at play. I don't believe in magic.

So to skip the rest of your quotes, in general, what you are saying is that a system which is essentially something like a simple KVM with DRBD, is the perfect solution. I am saying sure, for two nodes. How about 200?

Do you think AWS/GCP/Azure are running HCI solutions for example?

scottalanmiller

If you look at any one product, from any field, you might easily assume that the problems that that product has might be the result of the design or approach, rather than the product itself.

Example: If you use Cisco as the example of networking, you might assume that all networking equipment is expensive, slow, and riddled with security problems.

If you only look at Ubiquiti you might think that all networking has limited support, limited selection, and can't do SDN.

If you only look at Meraki, you might think that all networking uses a cloud controller.

And so forth. Or with databases, tons of people do this today and believe that all databases must have the limitations of Oracle and SQL Server because those two share a lot of licensing and tech similarities and without looking further, they will often associate their problems or caveats with databases as a category, rather than realizing that they are just similarities shared between those two.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

HCI doesn't hold up in the real world. See, I can keep doing this too

Except it does, tons of places are using it, and show me ANY example where it didn't shine... any. WHile somewhere, one must exist, dollars to donuts you can't find one.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Maybe in the SMB space there is less planning and more "lets just deliver something and let someone else support it", but it doesn't work in the enterprise.

Actually, that's exactly what loads and loads of enterprises do. But regardless of that, this has nothing to do with HC whatsoever and is neither here nor there.

dyasny

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

HCI doesn't hold up in the real world. See, I can keep doing this too

Except it does, tons of places are using it, and show me ANY example where it didn't shine... any. WHile somewhere, one must exist, dollars to donuts you can't find one.

I've worked with several Openstack and K8s clusters where the storage was local to the hypervisors, served from Ceph/Gluster/AFS/SheepDog. Horrible experience each time.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Exactly. SDS or SANs - you can pick whatever you want and suits you best. But if you are going to be running a replicated distributed storage service on hardware that is already quite busy running VMs, you'll end up in trouble as soon as you go anywhere near capacity.

Maybe you are running into these problems due to bad products, planning or design, but you are hoisting problems you've had onto everyone else where we aren't experiencing these problems.

You are "projecting". I'm not saying you or people you know having implemented HC and had it not work well. But you are confusing a bad implementation or planning with the architecture being bad. The two are different things.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

HCI doesn't hold up in the real world. See, I can keep doing this too

Except it does, tons of places are using it, and show me ANY example where it didn't shine... any. WHile somewhere, one must exist, dollars to donuts you can't find one.

I've worked with several Openstack and K8s clusters where the storage was local to the hypervisors, served from Ceph/Gluster/AFS/SheepDog. Horrible experience each time.

Right, that's my point, doing things badly doesn't mean that design is wrong. Your logic doesn't make sense... no matter how many times you fail at something doesn't mean that the task can't be done. It just means that you did it poorly, or wrong, or something.

Ceph and Gluster are both known to be bad for that, that should have been known going into the project. That someone didn't head you off at the pass shows that the mistakes and oversights were happening early on. We could easily have warned you that that wasn't meant to have good local performance.

So you are proving my point over and over again... you are using clearly wrong tech and looking at failures of one thing and misassociating them with another.

DustinB3403

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

The two are different things.

Like hugging your girlfriend while driving your car.

Both can be done poorly.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

I'm not talking about simple software RAID, I'm talking about keeping a distributed storage system in sync, and then on top of that keep constantly rebalancing to satisfy the local tiering requirements.

Okay... so thebig question is... since this is not part of HC... why? Stop doing that. You can never have a rational, useful discussion about HC until you talk about HC and not something else.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

So to skip the rest of your quotes, in general, what you are saying is that a system which is essentially something like a simple KVM with DRBD, is the perfect solution. I am saying sure, for two nodes. How about 200?

No one anywhere recommends 200 nodes in a single cluster. If you think this is a good idea, SAN, SDS, HC, or otherwise, we are on different pages. That's a scale that literally no one, not MS, not StarWind, not VMware, not RedHat recommends as a single failure domain. DO they support it? Yup. Do they think you are crazy? Yup.
Even in the enterprise, which you claim to know, they often use workloads scopes of this size for performance and safety. The larger the pool, the bigger the problems.
If you want to get into giant pools you have to pick your battles. Let's talk reasonable size, like 10-80 nodes. If you need screaming performance that no SAN can match, then you are looking at StarWind network RAID which does that. If you just want cheap pooled storage, you look at CEPH (and there are accelerators for that to make it fast,if you need to). If you want a blend of things you might look at VMware, Scale, or HPE Simplivity depending on your mix of needs. But good solutions exist at pretty much any scale, that meet or exceed the alternatives. Nothing is perfect, as with all things, the goal is to find what is best.

At truly giant scale, the only real benefit to totally external storage, is when speed and reliability are so unimportant that you are willing to sacrifice them to a huge degree to save a few dollars. But with HC's low cost today, even that is approaching the impossible.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

How exactly can there be no overhead, when you are synchronizing large chunks of blocks over the network? Especially with high replication factors? Add encryption to that, add all the extra logic for local tiering, and there's no wonder the minimal sizing for SDS is so high.

Because you don't have all those things. Large chunks over the network takes like no resources. No idea where you think the overhead comes from, but for most of us, things like copying data is not a high overhead activity. It's a dedicated network in most cases, with offload engines on the NICs, and things like tiering and such take extremely little overhead (if done well.) These just aren't CPU or RAM intensive activities.

scottalanmiller

@dyasny said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Do you think AWS/GCP/Azure are running HCI solutions for example?

Those aren't HA, so not applicable to the discussion. HCI is assumed to be replicating to other nodes, something those providers don't provide. They are stand alone compute nodes. Very different animal.

One minute you are assuming tiering, HA, and all kinds of things in your definition of HCI. Then the next you are skipping all of that and talking about players like these. You are jumping around with your definitions.

dyasny

@scottalanmiller said in StarWind HCA is one of the 10 coolest HCI systems of 2019 (so far):

Maybe you are running into these problems due to bad products, planning or design, but you are hoisting problems you've had onto everyone else where we aren't experiencing these problems.

You are "projecting". I'm not saying you or people you know having implemented HC and had it not work well. But you are confusing a bad implementation or planning with the architecture being bad. The two are different things.

Or maybe I'm just speaking from experience, and there's plenty of it. Local storage that is not shared is great, but it doesn't scale and pretty much kills all the nice features you can have in a virtualized DC - live migration, HA, all those things you don't care about in SMBs I suppose. Getting that storage replicated in a scalable fashion is hard, simple CBT pushed over network (what the folks at Linbit basically do) does not scale. And hard tasks require resources, those resoutces have to come from somewhere.

Ceph and Gluster are both known to be bad for that, that should have been known going into the project. That someone didn't head you off at the pass shows that the mistakes and oversights were happening early on. We could easily have warned you that that wasn't meant to have good local performance.

Mixing any distributed storage solution with any other workload is known to be bad, this is exactly what I'm saying. I've come into those projects when they were already implemented and got things working by breaking up those overloaded hosts into hardware that was doing one job and doing it well on either side.

Okay... so thebig question is... since this is not part of HC... why? Stop doing that. You can never have a rational, useful discussion about HC until you talk about HC and not something else.

But I am, at least at scale. DRBD and any similar system does not scale. When things are small (SMB level again) this is peanuts, we can do anything because our tasks are smaller than the hardware we can get. What happens at scale though?

No one anywhere recommends 200 nodes in a single cluster. If you think this is a good idea, SAN, SDS, HC, or otherwise, we are on different pages. That's a scale that literally no one, not MS, not StarWind, not VMware, not RedHat recommends as a single failure domain. DO they support it? Yup. Do they think you are crazy? Yup.

200 nodes is small for the scale I typically deal with. Red Hat has solutions that can deal with this kind of scale easily. I know of a few other companies that do. MS, VMW and probably StarWind do not, because of the nature of their clustering implementation, but that's basically all about how you manage locking.

Even in the enterprise, which you claim to know, they often use workloads scopes of this size for performance and safety. The larger the pool, the bigger the problems.

Not really. In a large pool, a dead node simply gets easily replaced. The effect is very small.

If you want to get into giant pools you have to pick your battles.

I usually am in those numbers, but ok

Let's talk reasonable size, like 10-80 nodes. If you need screaming performance that no SAN can match, then you are looking at StarWind network RAID which does that.

OK, so we have a network RAID, a bunch of blocks get streamed to other nodes when writes occur on one. When all there is is pushing block across, things are simple. What happens when a node dies, and I have to suddenly rebalance the data distribution? How is consistency kept? How does the system decide which blocks get streamed where? Even in a 10 node cluster, it would be plain out stupid to keep all the data replicated to everywhere, 10x the data on local disks would be too expensive

If you just want cheap pooled storage, you look at CEPH (and there are accelerators for that to make it fast,if you need to).

Here we have a distributed system, which starts at a least a core per RBD, and 32Gb or RAM to even get started properly. In SMB, I doubt you see many monstrous hypervisors with hundreds of cores, so what is there left to run your actual VMs?

At truly giant scale, the only real benefit to totally external storage, is when speed and reliability are so unimportant that you are willing to sacrifice them to a huge degree to save a few dollars. But with HC's low cost today, even that is approaching the impossible.

Only having a good storage fabric can give you excellent speed and very low latencies, and as for reliability - you can build whatever you want on the SAN side depending on your requirements. The only good thing about HC is local storage access, and it isn't really that far ahead of any decent fabric anyway, if at all.

Because you don't have all those things. Large chunks over the network takes like no resources. No idea where you think the overhead comes from, but for most of us, things like copying data is not a high overhead activity. It's a dedicated network in most cases, with offload engines on the NICs, and things like tiering and such take extremely little overhead (if done well.) These just aren't CPU or RAM intensive activities.

That is simply not true. Pushing large amount of data over the network is not cheap, and that is in the case of simple streaming. When you start running synchronizations and tiering stuff gets harder. And when you have to rebalance (which ceph does often) you need even more resources. Yes, you can dedicate NICs to just that (and those NICs will not be there to provide more bandwidth to the workload traffic) but in order to push large amounts of data into the NICs you also need CPU cycles and and RAM. It's CS 101, there are no free rides.

Those aren't HA, so not applicable to the discussion. HCI is assumed to be replicating to other nodes, something those providers don't provide. They are stand alone compute nodes. Very different animal.

My point exactly. If HC was so great, why wouldn't they be using it?