Infrastructure Needed for Hypervisor Cluster

scottalanmiller

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

Some method for the physical nodes to communicate with the storage device.

Also not needed, unless again, you want the IPOD design.

scottalanmiller

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

As I'm writing this, I realize I'm describing and IPOD, so an additional goal of this lab is to learn how to create my desired cluster, while avoiding an IPOD.

What your company has described, whether or not there is redundancy, and what you are describing IS an IPOD. IPOD doesn't imply single point of failure, it describes a top heavy architecture where the risk point and lack of robustness is the critical "point" of storage on the bottom.

The only way to avoid an IPOD is not to build one.

EddieJennings

@scottalanmiller said in Infrastructure Needed for Hypervisor Cluster:

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

Some storage device(s), which will provide the storage of the VMs for the physical nodes

No need for that. That's only if you want an IPOD design. Which for a lab is fine, to practice bad design.

That's what I'm having a hard time wrapping my head around: Making shared storage available without the IPOD. Unless my initial premise is wrong "To do clustering you must have some form of shared storage."

scottalanmiller

So, if the purpose is to build an IPOD (to learn how the people messed up at the office) then there is nothing to be done but to build one.

At a minimum you need two servers and a DAS unit. To do it more like the office, presumably you need two servers, a SAN, and a switch. To do it better, you would need redundant switches, and redundant SANs. All depends on your goals.

Because of their "vendor" based approach at the office, though, doing this at home isn't going to teach you a lot because you will likely use Hyper-V and a custom open SAN that you build. That won't teach you anything about VMware or 3PAR. Only way to learn those is to have those.

scottalanmiller

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

That's what I'm having a hard time wrapping my head around: Making shared storage available without the IPOD.

IPOD isn't the natural way to have shared storage. In your mind, as many people do because of marketing, the idea that storage is consolidated and external is just assumed, and that naturally leads you to an IPOD. Stop trying to consolidate and externalize as part of your sharing, and magically you go to hyperconvergence.

scottalanmiller

Watch this video that covers this...

Youtube Video

EddieJennings

@scottalanmiller said in Infrastructure Needed for Hypervisor Cluster:

So, if the purpose is to build an IPOD (to learn how the people messed up at the office) then there is nothing to be done but to build one.

The goal is specifically to not do that. The short-term goal is to learn to build a cluster without an IPOD. The long-term goal is to learn and understand $concepts to be able make intelligent decisions about when / how / why to use clustering (of hypervisors) in the real world.

scottalanmiller

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

The goal is specifically to not do that. The short-term goal is to learn to build a cluster without an IPOD.

That would be hyperconverged.

scottalanmiller

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

The long-term goal is to learn and understand $concepts to be able make intelligent decisions about when / how / why to use clustering (of hypervisors) in the real world.

Clustering is done when the cost of clustering is low versus the risk of not clustering.

EddieJennings

@scottalanmiller said in Infrastructure Needed for Hypervisor Cluster:

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

The goal is specifically to not do that. The short-term goal is to learn to build a cluster without an IPOD.

That would be hyperconverged.

Is this how that would look with a two-node setup?

Both node A and B would be running their hypervisors (for my lab, it's going to be Hyper-V or KVM)
Both node A and B would have enough power to be able to handle all of the deployed VMs. The thought behind this is when Node A need to be rebooted, you evacuate Node A's VMs to Node B. This line of thought would not address how to handle the sudden loss of Node A, unless Node A and B are somehow constantly in sync.
Communication between Node A and B would be done through a switch.

EddieJennings

@scottalanmiller said in Infrastructure Needed for Hypervisor Cluster:

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

The long-term goal is to learn and understand $concepts to be able make intelligent decisions about when / how / why to use clustering (of hypervisors) in the real world.

Clustering is done when the cost of clustering is low versus the risk of not clustering.

I agree. I think of a healthcare environment such as a hospital being something where the likely cost of clustering is less than the cost of the risk of not having clustering -- since likely they can't be in a situation where say the EMR app isn't available.

Perhaps I'm misunderstanding the real meaning of having a cluster. I see it like RAID. In your RAID 1, you have the one drive that fails, but you're not immediately down, nor are you waiting for something to fail over.

scottalanmiller

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

Is this how that would look with a two-node setup?

Node count only matters in a uni-node vs. multi-node perspective. And even then, not really.

You can hyperconverge with one node, two, three, four, ten, one thousand. Doesn't matter.

You can IPOD at any size, including one node, two, three, four, ten one thousand.

scottalanmiller

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

Both node A and B would have enough power to be able to handle all of the deployed VMs. The thought behind this is when Node A need to be rebooted, you evacuate Node A's VMs to Node B. This line of thought would not address how to handle the sudden loss of Node A, unless Node A and B are somehow constantly in sync.

This is a fine way to look at it. Just remember that your capacity planning here is based on high availability, not on hyperconvergence. HC doesn't require you to provide that level of capacity, HA does. If you want HC + HA, then this is the right way to capacity plan.

scottalanmiller

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

Communication between Node A and B would be done through a switch.

In a two node setup, that would not make sense. The switch is an unnecessary point of failure, cost, and point of latency. Just connect the two nodes directly together for a faster, more robust, cheaper solution.

Even with three nodes, you often direct connect all three. Four and larger, it's impractical to do anything but a switch.

scottalanmiller

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

I think of a healthcare environment such as a hospital being something where the likely cost of clustering is less than the cost of the risk of not having clustering -- since likely they can't be in a situation where say the EMR app isn't available.

Actually a hospital often can be without EMR. Not that it isn't good to cluster there, but the needs of an EMR rarely require HA. EMR isn't like the medical equipment itself, which can't fail or people die. EMR being down just delays a doctor somewhat. Bad, yes, but not life threatening.

EddieJennings

@scottalanmiller said in Infrastructure Needed for Hypervisor Cluster:

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

Communication between Node A and B would be done through a switch.

In a two node setup, that would not make sense. The switch is an unnecessary point of failure, cost, and point of latency. Just connect the two nodes directly together for a faster, more robust, cheaper solution.

Even with three nodes, you often direct connect all three. Four and larger, it's impractical to do anything but a switch.

I'm thinking the connections being done through using Ethernet and forgot to consider just using cross-over cables to connect the nodes directly.

scottalanmiller

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

Perhaps I'm misunderstanding the real meaning of having a cluster. I see it like RAID. In your RAID 1, you have the one drive that fails, but you're not immediately down, nor are you waiting for something to fail over.

Yes, but unlike RAID which is ridiculously cheap compared to what it protects against, clustering is very expensive compared to what it protects against.

Examples to come...

scottalanmiller

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

I'm thinking the connections being done through using Ethernet and forgot to consider just using cross-over cables to connect the nodes directly.

We haven't used crossover cables in decades. They went out with hubs Just normal cables.

scottalanmiller

RAID's primary function is to protect against data loss, not availability loss. The latter is generally seen as a by product, not a goal. Data loss, for a normal business, has a massive cost and risk compared to availability. One hour of lost productivity is often trivial to absorb and can often even be made up. Losing one hour of customer information could result in pretty tragic loss of information. And RAID tends to protect against a lot of data loss, and a little uptime. Also, RAID costs starts around $100, and average is probably around $800 to implement. But protects against huge data loss in most cases.

Clustering does not protect against data loss (and can actually contribute to data loss if we aren't careful.) Clustering only (under normal conditions) protects against availability loss, the lesser factor with RAID. So we have to justify clustering based solely off of improved up time, not loss of data. That makes it much harder to justify and tips the scales from "always do it" to "almost never do it." The difference is that dramatic.

And the starting cost of clustering is generally several thousand dollars with the average likely being in the tens of thousands.

Also, RAID requires essentially zero IT skills. You can get it as simply as checking a box when ordering a server. Clustering, however, requires a lot of complex interactions, includes a bit of risk, and normally a huge amount of either cost or expertise or both.

So basically... RAID is a few hundred dollars to protect against some of the worst issues you can face, with zero overhead. Clustering costs tens of thousands to protect against something generally trivial with loads of overhead.

EddieJennings

@scottalanmiller said in Infrastructure Needed for Hypervisor Cluster:

@EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

Both node A and B would have enough power to be able to handle all of the deployed VMs. The thought behind this is when Node A need to be rebooted, you evacuate Node A's VMs to Node B. This line of thought would not address how to handle the sudden loss of Node A, unless Node A and B are somehow constantly in sync.

This is a fine way to look at it. Just remember that your capacity planning here is based on high availability, not on hyperconvergence. HC doesn't require you to provide that level of capacity, HA does. If you want HC + HA, then this is the right way to capacity plan.

So if HA isn't necessary, you could potentially have nodes with various hardware -- such as in my lab where I've accumulated two different servers with different hardware specs: a Dell R310 and a Dell T420. You would then need software to manage the cluster. I assume this is where applications like oVirt or Failover Cluster Manager come into play. If true, then you'd have a VM running on one of the nodes whose purpose is to run the management application.