ESXi cluster, advice needed

scottalanmiller

@pmoncho said in ESXi cluster, advice needed:

I feel I can save up to $200 during ESXi setup alone. Plus, $300 for vCenter is an easy bonus for further configuration and management.

No cost savings here, not sure where you are finding that savings. Hyper-V doesn't cost $200 more than ESXi to deploy.

And ESXi is quite expensive once you get to any size. It just keeps costing more and more. And you pay for every little feature.

Bottom line, it costs more, and it doesn't bring benefits. It's not faster, it's not easier, it doesn't do anything beneficial.

You are arguing from the position of "it's not THAT bad", but you haven't said anything that's better, only "not that bad." That you have to use "not that bad" as the position tells us that to you, too, it doesn't feel like a good product, but you aren't ready to articulate it yet. But we are hearing you agree with us because stating that it's only "less bad" is very clear that it's still "worse".

scottalanmiller

@travisdh1 said in ESXi cluster, advice needed:

It even includes a web based management platform (Cockpit).

THis is huge. Whether it's Cockpit for small deployments, virt-manager, or Proxmox or whatever, VMware is so far behind in good management tools.

The majority of our deployments are single server shops, and Vmware is far and away the worst option. Hyper-V is awful compared to Xen and KVM, and still worlds ahead of VMware. The ease of being able to quickly and easily manage these systems remotely is a killer feature in the tiniest of shops.

scottalanmiller

@stacksofplates said in ESXi cluster, advice needed:

This is not acceptable for 80 VMs. You can't clone through it. You're going to manually install 80 systems without a base template? Virt-Manager is still the best way to manage.

Not for 80, no. I think he was saying that when you get to the tiny end of the scale, KVM offers advantages in a different way.

pmoncho

@scottalanmiller said in ESXi cluster, advice needed:

@pmoncho said in ESXi cluster, advice needed:

In a single server, It seems the free alternatives lose their feature power and it all comes down to $500 (actually $576).

I have no idea why it seems that way, but it's simply not true. More features, more flexibility... all still there.

In a single server situation, what flexibility or feature exists? I'm only seeing the benefits of free hypervisors kick in in multi-server situations.

stacksofplates

@scottalanmiller said in ESXi cluster, advice needed:

@stacksofplates said in ESXi cluster, advice needed:

This is not acceptable for 80 VMs. You can't clone through it. You're going to manually install 80 systems without a base template? Virt-Manager is still the best way to manage.

Not for 80, no. I think he was saying that when you get to the tiny end of the scale, KVM offers advantages in a different way.

Not sure why he would use that case. The case we are talking about is 80 VMs on a single node.

scottalanmiller

@pmoncho said in ESXi cluster, advice needed:

@scottalanmiller said in ESXi cluster, advice needed:

@pmoncho said in ESXi cluster, advice needed:

In a single server, It seems the free alternatives lose their feature power and it all comes down to $500 (actually $576).

I have no idea why it seems that way, but it's simply not true. More features, more flexibility... all still there.

In a single server situation, what flexibility or feature exists? I'm only seeing the benefits of free hypervisors kick in in multi-server situations.

Not as many, surely. And it varies by the product, but...

No licensing or license center or logins needed to get downloads and updates (this is a killer feature IMHO, as an MSP we deal with companies that got screwed by this constantly - often when IT pros move on to another company and no one remembers or knows what licenses to manage.)
Easier remote access, generally without needing to deploy and manage another virtual machine just for that.
Simpler overall management. VMware is the only platform we are regularly called in to manage because it is "too hard" for the IT teams. We manage everything because people are busy, but VMware is the only one that is regularly "too hard" and people are having issues with it.
More reliable updates. We see more issues per-VMware install than all other platforms combined. It's not bad, it's just not on par with the market.
Broader hardware support and options. Software RAID being the massive feature here in the single server space (things like VSANs in the bigger spaces.)
Included features like built in backups are common in other packages.
Even in stand alone environments, ability to move workloads to another machine are a huge deal. Stand alone doesn't imply a lack of hardware to send workloads to. It might be that way, but not necessarily.
Preparation for growth. The single server environment today is the two server environment tomorrow. Flexibility to protect against the unknown always matters.

pmoncho

@scottalanmiller said in ESXi cluster, advice needed:

@pmoncho said in ESXi cluster, advice needed:

@scottalanmiller said in ESXi cluster, advice needed:

@pmoncho said in ESXi cluster, advice needed:

In a single server, It seems the free alternatives lose their feature power and it all comes down to $500 (actually $576).

I have no idea why it seems that way, but it's simply not true. More features, more flexibility... all still there.

In a single server situation, what flexibility or feature exists? I'm only seeing the benefits of free hypervisors kick in in multi-server situations.

Not as many, surely. And it varies by the product, but...

No licensing or license center or logins needed to get downloads and updates (this is a killer feature IMHO, as an MSP we deal with companies that got screwed by this constantly - often when IT pros move on to another company and no one remembers or knows what licenses to manage.)

Easier remote access, generally without needing to deploy and manage another virtual machine just for that.

Simpler overall management. VMware is the only platform we are regularly called in to manage because it is "too hard" for the IT teams. We manage everything because people are busy, but VMware is the only one that is regularly "too hard" and people are having issues with it.

More reliable updates. We see more issues per-VMware install than all other platforms combined. It's not bad, it's just not on par with the market.

Broader hardware support and options. Software RAID being the massive feature here in the single server space (things like VSANs in the bigger spaces.)

Included features like built in backups are common in other packages.

Even in stand alone environments, ability to move workloads to another machine are a huge deal. Stand alone doesn't imply a lack of hardware to send workloads to. It might be that way, but not necessarily.

Preparation for growth. The single server environment today is the two server environment tomorrow. Flexibility to protect against the unknown always matters.

Thanks a lot for this. Will keep these in mind.

rtfm

hi everybody!
first of all thank you for your contribution. Keep it simple...
However, i did not mention (intentionally, no ofence i will explain myself) the following facts:

we already have these VMs hosted in a 4-node flexpod environment (if vmware enterprise plus is an overkill for us, then how would you judge flexpod???).
our organization is rich in terms of money but poor in terms of IT intellectual capital. Therefore we need outsourced support. in our place it is hard to find that, so we usually address ourselves to certified solutions.
we have invested time and money on vmware hypervisor and our poor IT would not like to throw this away.
the initial question was supposed to refer to a DRS solution based on vmware SRM (VM based replication, not array based), however years after studying your recommendations i would like to try something more simple.

i understand that all above mentioned arguments are usually trivial in competitive environments, but unfortunately this is not our case.

sorry for wasting your time. i am thankful to you for your recommendations.
BTW what happens if a single node, or a node with local storage is lost? isn't that a potential cause for filesystem corruption?
Moreover, how do i put the host in maintenance mode (Hmmm, and why should i do that if i only have one host, especially with let's say free esxi?)?

travisdh1

@rtfm said in ESXi cluster, advice needed:

hi everybody!
first of all thank you for your contribution. Keep it simple...
However, i did not mention (intentionally, no ofence i will explain myself) the following facts:

we already have these VMs hosted in a 4-node flexpod environment (if vmware enterprise plus is an overkill for us, then how would you judge flexpod???).

our organization is rich in terms of money but poor in terms of IT intellectual capital. Therefore we need outsourced support. in our place it is hard to find that, so we usually address ourselves to certified solutions.

we have invested time and money on vmware hypervisor and our poor IT would not like to throw this away.

the initial question was supposed to refer to a DRS solution based on vmware SRM (VM based replication, not array based), however years after studying your recommendations i would like to try something more simple.

sorry for wasting your time. i am thankful to you for your recommendations. BTW what happens if a single node, or a node with local storage is lost? isn't that a potential cause for filesystem corruption?

Just a waste of money
What region of the world are you located in?
This is just plain bad thinking. IT by it's nature is always changing. Learning something different should be very quick, resisting change just because you already know something is just the opposite of what IT should be doing.
Also just a waste of money

Your statement about competitive environments doesn't make any sense. Many of the solutions mentioned are open source and available to anyone with an internet connection.

Assuming you are running with at least the 3-node minimum and a single node is lost, nothing happens. Once the node is put back online, everything is automatically handled in the background for you (Starwind, Gluster, Ceph, Scale).

No need for a "maintenance mode". Updates are handled without the need of a reboot, but we still recommend power cycling everything on a regular basis.

IRJ

First of all, you need move every service you can to SaaS and Pass solution. Get rid of your database servers on prem and put them on PaaS solution. Why do you want to manage infrastructure especially if you are short staffed in IT?

Dashrender

@rtfm said in ESXi cluster, advice needed:

hi everybody!
first of all thank you for your contribution. Keep it simple...
However, i did not mention (intentionally, no ofence i will explain myself) the following facts:

we already have these VMs hosted in a 4-node flexpod environment (if vmware enterprise plus is an overkill for us, then how would you judge flexpod???).

our organization is rich in terms of money but poor in terms of IT intellectual capital. Therefore we need outsourced support. in our place it is hard to find that, so we usually address ourselves to certified solutions.

we have invested time and money on vmware hypervisor and our poor IT would not like to throw this away.

the initial question was supposed to refer to a DRS solution based on vmware SRM (VM based replication, not array based), however years after studying your recommendations i would like to try something more simple.

i understand that all above mentioned arguments are usually trivial in competitive environments, but unfortunately this is not our case.

sorry for wasting your time. i am thankful to you for your recommendations.
BTW what happens if a single node, or a node with local storage is lost? isn't that a potential cause for filesystem corruption?
Moreover, how do i put the host in maintenance mode (Hmmm, and why should i do that if i only have one host, especially with let's say free esxi?)?

?
Why is your organization poor on IT capital? Why not hire consultants to do it? No reason to have them be on staff, is there?
This is the sunk cost fallacy. That money is already spent, consider it gone and move forward with a most cost effective solution - that said, sometimes where you already are is the most cost effective when all aspects are considered - at least until a full overhaul is required.
?

stacksofplates

@travisdh1 said in ESXi cluster, advice needed:

@rtfm said in ESXi cluster, advice needed:

hi everybody!
first of all thank you for your contribution. Keep it simple...
However, i did not mention (intentionally, no ofence i will explain myself) the following facts:

we already have these VMs hosted in a 4-node flexpod environment (if vmware enterprise plus is an overkill for us, then how would you judge flexpod???).

our organization is rich in terms of money but poor in terms of IT intellectual capital. Therefore we need outsourced support. in our place it is hard to find that, so we usually address ourselves to certified solutions.

we have invested time and money on vmware hypervisor and our poor IT would not like to throw this away.

the initial question was supposed to refer to a DRS solution based on vmware SRM (VM based replication, not array based), however years after studying your recommendations i would like to try something more simple.

sorry for wasting your time. i am thankful to you for your recommendations. BTW what happens if a single node, or a node with local storage is lost? isn't that a potential cause for filesystem corruption?

Just a waste of money

What region of the world are you located in?

This is just plain bad thinking. IT by it's nature is always changing. Learning something different should be very quick, resisting change just because you already know something is just the opposite of what IT should be doing.

Also just a waste of money

Your statement about competitive environments doesn't make any sense. Many of the solutions mentioned are open source and available to anyone with an internet connection.

Assuming you are running with at least the 3-node minimum and a single node is lost, nothing happens. Once the node is put back online, everything is automatically handled in the background for you (Starwind, Gluster, Ceph, Scale).

No need for a "maintenance mode". Updates are handled without the need of a reboot, but we still recommend power cycling everything on a regular basis.

You're comparing three different things there. Starwind is a VSA, Gluster and Ceph are replicated storage, and Scale is a hyperconverged product.

Scale isn't open source and is more expensive than the VMware licensing he has mentioned (and a ton less features).

I'm guessing you haven't managed Gluster with KVM. If you have any sizeable images at all it will take forever to sync when you lose a node. And it's going to be a full manual setup with whatever KVM management tool you're using (libvirt, ProxMox, etc).

Starwind is also not open source and no longer supported for KVM or Hyper-V (https://www.starwindsoftware.com/resource-library/starwind-virtual-storage-appliance-installation-guide-with-kvm/) . So why even mention it

The only one that makes any sense is Ceph because it has somehwat of an automated setup in ProxMox. And we've had people in the community say things like this:

CEPH is so slow that there are whole products, like from Starwind, built just to make it fast enough to use for virtualization.

And I agree. I don't think Ceph is a great option for that.

IMO the only way that KVM makes sense is if you use local storage on different servers and have the replication between the VMs themselves and not the hosts. But this takes a lot of work. You'll want to automate the deployments and creation of each VM so no cockpit. And you won't want to do that by hand.

My point is, you can't just say "this is available through open source tools" and expect people to be able to do a setup like that from scratch with no experience.

scottalanmiller

@rtfm said in ESXi cluster, advice needed:

our organization is rich in terms of money but poor in terms of IT intellectual capital. Therefore we need outsourced support. in our place it is hard to find that, so we usually address ourselves to certified solutions.

This is a misunderstanding of markets. There is no such thing as a place with hard to get IT. IT has no location and there are essentially unlimited numbers of available excellent resources ready to assist any business. Businesses simply choose not to look for or hire them and instead hire sales people who screw them and hide their costs in "products" rather than honest or qualified advice. Every business should have outsources support, almost no company is big enough to have all the right people internally. But no business is in a location or situation that it can't get good people.

Certified solutions is really just a way to say "expensive products that are focused on resellers" or, another way, bad solutions that cost you far more to operate. They are channel products designed beginning to end to take advantage of this mindset and to get as much money out of companies that believe this as possible. It's an extremely common and effective game that they play.

If your company doesn't know how to achieve this, then the first thing that they need is a real outsourced CIO. A good CIO will save you a fortune in hours. Running without one is financially reckless.

scottalanmiller

@rtfm said in ESXi cluster, advice needed:

we have invested time and money on vmware hypervisor and our poor IT would not like to throw this away.

This is a business fallacy called the "sunk cost fallacy." Learning a hypervisor is a trivial matter. And moving to simple solutions is a long term investment. IT shouldn't "want" to do anything, they should simply be focused on what's best for the business. That, alone, defines what IT's job is.

https://en.wikipedia.org/wiki/Sunk_cost

Learning another product is just a few hours for those that don't know virtualization, and often "zero" time because it is so easy.

scottalanmiller

@rtfm said in ESXi cluster, advice needed:

i understand that all above mentioned arguments are usually trivial in competitive environments, but unfortunately this is not our case.

It really is. You can't be stuck in a case where good options aren't available. They might be refused because of politics, but there is a difference between choosing one thing, and being stuck with only one thing as an option.

scottalanmiller

@rtfm said in ESXi cluster, advice needed:

BTW what happens if a single node, or a node with local storage is lost?

This is a bad way to think about risk. You need to look at the whole, not "what if" scenarios to understand risk. Looking at a "what if" makes you do really bad things. "What if a meteor hits?" would make you put backups on Mars, for example. That's not realistic.

This is what backups are for. Having a standalone node doesn't mean you have no backups nor that you don't have something to restore to. So what happens if a single node is lost? You keep running on another node, restored from backup. This is how most companies in the world handle it, and they do it because it's an extremely cost effective, and safe pattern. It requires the least investment, and the least IT knowledge, and has the least chance of failing due to complexity.

scottalanmiller

@rtfm said in ESXi cluster, advice needed:

isn't that a potential cause for filesystem corruption?

Any filesystem can get corruption. But these days, that's rare. That's mostly a 1990s and 2000s problem. By 2005, production filesystems, even on Windows, are so stable that we don't really see this. Not that it can't happen. But it used to be common, now it's something most pros won't see in a lifetime.

That said, having combined storage like VSAN, CEPH, etc. make this far more likely because there is so much more complexity in the storage layer. Standalone, again, protects you (just a tiny bit) here by lowering the complexity and making the basics more reliable.

Remember, a brick is simple and almost never fails. It's hard to engineer any structure that, through redundancy or complexity, is more reliable than a brick because even though a brick is simple and singular, it's just insanely reliable. Standalone systems are more like a brick than anything. Also, like bricks, stand alone approaches are cheap.

Bricks with backups are hard to beat.

scottalanmiller

@rtfm said in ESXi cluster, advice needed:

Moreover, how do i put the host in maintenance mode (Hmmm, and why should i do that if i only have one host, especially with let's say free esxi?)?

Simply put, you don't. Let's use KVM as an example. And a stand alone setup, as well. Very few business, essentially none, have zero potential downtime options. Downtime is actually very cheap and an important part of protecting against the real problem - unplanned downtime. Regular, small planned downtime is generally free or essentially free, heck even Wall St. does it this way to save money, and you don't need maintenance modes. You just reboot (or whatever.)

If you truly need zero downtime, then that cannot be addressed by the virtualization layer and this conversation is moot because that means you need full redundancy at the application layer (otherwise you can't patch you operating systems, databases, applications, etc.) at which point you can do maintenance modes the only viable way - at the application layer anyway and stand alone hypervisors make no difference.

So if you need non-stop 24x7 operations, or you don't, in both cases you can do it effectively with stand alone nodes.

scottalanmiller

@rtfm said in ESXi cluster, advice needed:

we already have these VMs hosted in a 4-node flexpod environment (if vmware enterprise plus is an overkill for us, then how would you judge flexpod???).

This is unfortunate. Kind of the "worst of the worst". There is no way to really sugar coat this. It's the worst hardware on the market (Cisco), with a rather poor storage layer (NetApp), with an unnecessarily expensive hypervisor (VMware) where you end up with really, really high cost and hardware/setup that many of us would want to just throw in the trash.

Does it work? Well, kinda. Chances are the cost of this one purchase alone would have paid to hire an IT department to solve the bigger problems and implement a simpler, more efficient, more reliable solution that addresses your needs, rather than just empties your coffers.

This is, unfortunately, a setup designed specifically to prey on companies that think that they have to buy "products" instead of expertise and that they can skip IT. But it doesn't work that way. To quote VMware themselves "high availability is something you do, not something that you buy." Even the companies that sell this product don't believe that this is in any way a substitute for getting access to IT resources that are going to look at your needs and engineer a solution based around them.

We all understand that this means that you have now already purchased all of this and that there is no way to fix that. The only thing you can do now is look at the scale of this mistake and use it as a learning exercise to go back to your company and try to address the broken thought processes that brought them to what should have been an obvious "never do this" scenario. This suggests that they likely do the text book "never do this in business" things of engaging sales people, resellers and the vendors asking how they can spend money rather than getting business experts to actually figure out what the needs are, and what would address them. The goal had to be "how do we spend money", not "how do we solve a business need." There is a massive opportunity for improvement here, but it won't help until "next time." but there will be a next time, so this lesson is insanely important to learn.

That said, though, you are asking how to fix this. You've figured out that this setup isn't good. That's the first step. You know that you need reliable storage instead of the RAID 4 NAS device single point of failure, that's good. I would step all the way back and consider all of it a waste and look at "how best to move forward" based on what you own, and try to remove IT's emotions from it because it is what it is, and those emotions will only hurt the company (and IT itself) long term.

scottalanmiller

@IRJ said in ESXi cluster, advice needed:

First of all, you need move every service you can to SaaS and Pass solution. Get rid of your database servers on prem and put them on PaaS solution.

These are good things to consider. But we have no way to know if they have any option to do these things. There is never a possibility of "never have on prem databases." Prem vs. hosted is always based on business needs and neither "always on prem" or "always hosted" will ever be as simple as one way being the right way. This is always, no exceptions, something that has to be analyzed and determined.

Sure, there is excellent potential for hosted to be a good option, but there is every possibility that it's not even viable. As an MSP with hundreds of customers, the average cannot use a hosted database or any hosted data store, it's not even a possible option, let alone a viable one.