Infrastructure Needed for Hypervisor Cluster



  • My next big project for my home lab is to practice deploying a cluster of hypervisors and managing them. This endeavor is inspired by my work environment, which from what I understand is basically this:

    We have Cisco UCS blades which have a hypervisor installed on them (some VMware, some Hyper-V -- the reason for that is beyond the scope of my project). The chassis we have talk to the fabric interconnect (we have two FIs), which then talk to our $SAN. I know we have an HP 3Par as well as some other hardware, so I assume there's some redundancy with the storage piece, so we're not sitting on an IPOD; however, there's a separate team that deals with storage, so there are limits to what I know about our storage infrastructure.

    From the systems perspective, we manage the above (except the Hyper-V hypervisors) with vCenter and the UCS management console for the actual hardware. I'm not sure how the storage piece is managed.

    So back to the lab goal. What I want to do is to have two nodes on which I'll install a hypervisor, which could be expanded to $someNumber of nodes should I have the need as well as win the lottery / rob a bank. With these nodes, I want to be able to practice migrating VMs back and forth between them and simulate the failure of a node and learn how to deal with such a failure.

    Here is what I think I need to build the above.

    1. Two servers, which would be the physical nodes.
    2. Some storage device(s), which will provide the storage of the VMs for the physical nodes
    3. Some method for the physical nodes to communicate with the storage device.

    As I'm writing this, I realize I'm describing and IPOD, so an additional goal of this lab is to learn how to create my desired cluster, while avoiding an IPOD.



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    We have Cisco UCS blades

    Bwahahaha.... not just UCS and not just blades, but both!



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    fabric interconnect (we have two FIs), which then talk to our $SAN.

    Quite the environment there 😉



  • @scottalanmiller said in Infrastructure Needed for Hypervisor Cluster:

    @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    We have Cisco UCS blades

    Bwahahaha.... not just UCS and not just blades, but both!

    Yeah. . . I know. This was done long before I arrived, and will be there long after I leave 😛



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    I know we have an HP 3Par as well as some other hardware, so I assume there's some redundancy with the storage piece, so we're not sitting on an IPOD

    Almost anything can offer redundancy, most people choose things like 3PAR because they are old it based on being magic and not needing redundancy. It should be there, but that it is a 3PAR makes it less likely to be redundant, rather than more, I would wager.



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    Some storage device(s), which will provide the storage of the VMs for the physical nodes

    No need for that. That's only if you want an IPOD design. Which for a lab is fine, to practice bad design.



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    Some method for the physical nodes to communicate with the storage device.

    Also not needed, unless again, you want the IPOD design.



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    As I'm writing this, I realize I'm describing and IPOD, so an additional goal of this lab is to learn how to create my desired cluster, while avoiding an IPOD.

    What your company has described, whether or not there is redundancy, and what you are describing IS an IPOD. IPOD doesn't imply single point of failure, it describes a top heavy architecture where the risk point and lack of robustness is the critical "point" of storage on the bottom.

    The only way to avoid an IPOD is not to build one.



  • @scottalanmiller said in Infrastructure Needed for Hypervisor Cluster:

    @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    Some storage device(s), which will provide the storage of the VMs for the physical nodes

    No need for that. That's only if you want an IPOD design. Which for a lab is fine, to practice bad design.

    That's what I'm having a hard time wrapping my head around: Making shared storage available without the IPOD. Unless my initial premise is wrong "To do clustering you must have some form of shared storage."



  • So, if the purpose is to build an IPOD (to learn how the people messed up at the office) then there is nothing to be done but to build one.

    At a minimum you need two servers and a DAS unit. To do it more like the office, presumably you need two servers, a SAN, and a switch. To do it better, you would need redundant switches, and redundant SANs. All depends on your goals.

    Because of their "vendor" based approach at the office, though, doing this at home isn't going to teach you a lot because you will likely use Hyper-V and a custom open SAN that you build. That won't teach you anything about VMware or 3PAR. Only way to learn those is to have those.



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    That's what I'm having a hard time wrapping my head around: Making shared storage available without the IPOD.

    IPOD isn't the natural way to have shared storage. In your mind, as many people do because of marketing, the idea that storage is consolidated and external is just assumed, and that naturally leads you to an IPOD. Stop trying to consolidate and externalize as part of your sharing, and magically you go to hyperconvergence.



  • Watch this video that covers this...

    Youtube Video



  • @scottalanmiller said in Infrastructure Needed for Hypervisor Cluster:

    So, if the purpose is to build an IPOD (to learn how the people messed up at the office) then there is nothing to be done but to build one.

    The goal is specifically to not do that. The short-term goal is to learn to build a cluster without an IPOD. The long-term goal is to learn and understand $concepts to be able make intelligent decisions about when / how / why to use clustering (of hypervisors) in the real world.



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    The goal is specifically to not do that. The short-term goal is to learn to build a cluster without an IPOD.

    That would be hyperconverged.



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    The long-term goal is to learn and understand $concepts to be able make intelligent decisions about when / how / why to use clustering (of hypervisors) in the real world.

    Clustering is done when the cost of clustering is low versus the risk of not clustering.



  • @scottalanmiller said in Infrastructure Needed for Hypervisor Cluster:

    @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    The goal is specifically to not do that. The short-term goal is to learn to build a cluster without an IPOD.

    That would be hyperconverged.

    Is this how that would look with a two-node setup?

    • Both node A and B would be running their hypervisors (for my lab, it's going to be Hyper-V or KVM)
    • Both node A and B would have enough power to be able to handle all of the deployed VMs. The thought behind this is when Node A need to be rebooted, you evacuate Node A's VMs to Node B. This line of thought would not address how to handle the sudden loss of Node A, unless Node A and B are somehow constantly in sync.
    • Communication between Node A and B would be done through a switch.


  • @scottalanmiller said in Infrastructure Needed for Hypervisor Cluster:

    @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    The long-term goal is to learn and understand $concepts to be able make intelligent decisions about when / how / why to use clustering (of hypervisors) in the real world.

    Clustering is done when the cost of clustering is low versus the risk of not clustering.

    I agree. I think of a healthcare environment such as a hospital being something where the likely cost of clustering is less than the cost of the risk of not having clustering -- since likely they can't be in a situation where say the EMR app isn't available.

    Perhaps I'm misunderstanding the real meaning of having a cluster. I see it like RAID. In your RAID 1, you have the one drive that fails, but you're not immediately down, nor are you waiting for something to fail over.



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    Is this how that would look with a two-node setup?

    Node count only matters in a uni-node vs. multi-node perspective. And even then, not really.

    You can hyperconverge with one node, two, three, four, ten, one thousand. Doesn't matter.

    You can IPOD at any size, including one node, two, three, four, ten one thousand.



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    Both node A and B would have enough power to be able to handle all of the deployed VMs. The thought behind this is when Node A need to be rebooted, you evacuate Node A's VMs to Node B. This line of thought would not address how to handle the sudden loss of Node A, unless Node A and B are somehow constantly in sync.

    This is a fine way to look at it. Just remember that your capacity planning here is based on high availability, not on hyperconvergence. HC doesn't require you to provide that level of capacity, HA does. If you want HC + HA, then this is the right way to capacity plan.



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    Communication between Node A and B would be done through a switch.

    In a two node setup, that would not make sense. The switch is an unnecessary point of failure, cost, and point of latency. Just connect the two nodes directly together for a faster, more robust, cheaper solution.

    Even with three nodes, you often direct connect all three. Four and larger, it's impractical to do anything but a switch.



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    I think of a healthcare environment such as a hospital being something where the likely cost of clustering is less than the cost of the risk of not having clustering -- since likely they can't be in a situation where say the EMR app isn't available.

    Actually a hospital often can be without EMR. Not that it isn't good to cluster there, but the needs of an EMR rarely require HA. EMR isn't like the medical equipment itself, which can't fail or people die. EMR being down just delays a doctor somewhat. Bad, yes, but not life threatening.



  • @scottalanmiller said in Infrastructure Needed for Hypervisor Cluster:

    @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    Communication between Node A and B would be done through a switch.

    In a two node setup, that would not make sense. The switch is an unnecessary point of failure, cost, and point of latency. Just connect the two nodes directly together for a faster, more robust, cheaper solution.

    Even with three nodes, you often direct connect all three. Four and larger, it's impractical to do anything but a switch.

    I'm thinking the connections being done through using Ethernet and forgot to consider just using cross-over cables to connect the nodes directly.



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    Perhaps I'm misunderstanding the real meaning of having a cluster. I see it like RAID. In your RAID 1, you have the one drive that fails, but you're not immediately down, nor are you waiting for something to fail over.

    Yes, but unlike RAID which is ridiculously cheap compared to what it protects against, clustering is very expensive compared to what it protects against.

    Examples to come...



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    I'm thinking the connections being done through using Ethernet and forgot to consider just using cross-over cables to connect the nodes directly.

    We haven't used crossover cables in decades. They went out with hubs 🙂 Just normal cables.



  • RAID's primary function is to protect against data loss, not availability loss. The latter is generally seen as a by product, not a goal. Data loss, for a normal business, has a massive cost and risk compared to availability. One hour of lost productivity is often trivial to absorb and can often even be made up. Losing one hour of customer information could result in pretty tragic loss of information. And RAID tends to protect against a lot of data loss, and a little uptime. Also, RAID costs starts around $100, and average is probably around $800 to implement. But protects against huge data loss in most cases.

    Clustering does not protect against data loss (and can actually contribute to data loss if we aren't careful.) Clustering only (under normal conditions) protects against availability loss, the lesser factor with RAID. So we have to justify clustering based solely off of improved up time, not loss of data. That makes it much harder to justify and tips the scales from "always do it" to "almost never do it." The difference is that dramatic.

    And the starting cost of clustering is generally several thousand dollars with the average likely being in the tens of thousands.

    Also, RAID requires essentially zero IT skills. You can get it as simply as checking a box when ordering a server. Clustering, however, requires a lot of complex interactions, includes a bit of risk, and normally a huge amount of either cost or expertise or both.

    So basically... RAID is a few hundred dollars to protect against some of the worst issues you can face, with zero overhead. Clustering costs tens of thousands to protect against something generally trivial with loads of overhead.



  • @scottalanmiller said in Infrastructure Needed for Hypervisor Cluster:

    @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    Both node A and B would have enough power to be able to handle all of the deployed VMs. The thought behind this is when Node A need to be rebooted, you evacuate Node A's VMs to Node B. This line of thought would not address how to handle the sudden loss of Node A, unless Node A and B are somehow constantly in sync.

    This is a fine way to look at it. Just remember that your capacity planning here is based on high availability, not on hyperconvergence. HC doesn't require you to provide that level of capacity, HA does. If you want HC + HA, then this is the right way to capacity plan.

    So if HA isn't necessary, you could potentially have nodes with various hardware -- such as in my lab where I've accumulated two different servers with different hardware specs: a Dell R310 and a Dell T420. You would then need software to manage the cluster. I assume this is where applications like oVirt or Failover Cluster Manager come into play. If true, then you'd have a VM running on one of the nodes whose purpose is to run the management application.



  • Maybe a dumb question, but was it is that makes it hyperconverged solution compared to "just a bunch of hypervisors" with local storage that are managed together?

    Is it vSAN (or equivalent)?



  • @Pete-S said in Infrastructure Needed for Hypervisor Cluster:

    Maybe a dumb question

    Hey now! Only I get to be t3h n00b in this thread 😛



  • @Pete-S said in Infrastructure Needed for Hypervisor Cluster:

    Maybe a dumb question, but was it is that makes it hyperconverged solution compared to "just a bunch of hypervisors" with local storage that are managed together?

    Is it vSAN (or equivalent)?

    Hyperconverged just means everything is in the same box: storage, compute, network... Vsan has nothing to do with the name.



  • @EddieJennings said in Infrastructure Needed for Hypervisor Cluster:

    So if HA isn't necessary, you could potentially have nodes with various hardware -- such as in my lab where I've accumulated two different servers with different hardware specs: a Dell R310 and a Dell T420.

    Sure. People do that all of the time.



  • @Pete-S said in Infrastructure Needed for Hypervisor Cluster:

    Maybe a dumb question, but was it is that makes it hyperconverged solution compared to "just a bunch of hypervisors" with local storage that are managed together?

    Nothing, that's all hyperconverged is, assuming the local storage piece is shared. All the pieces together in a single layer, managed together.

    Just like an IPOD is nothing more than the opposite 🙂