Windows Failover Clustering... what are your views and why?
-
@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:
It's just hard to justify not using all the CSV space 'as it's there'.... When the reason the space is there... Is for expected growth.
Just ask that people sign off on removing the investment in expected growth. Clearly someone approved a budget for that. Now something else is deemed more important. So get whoever signed off on investing in growth to approve their budget being taken by someone else.
-
The thing is, they are VMs, you can move shit around whenever, to wherever, should the need arise.... and without any downtime if needed. I think each server/service/system should have it's own "SLA" (it's almost midnight can't think of the word now) and should be placed appropriately. Only you can answer whether or not it needs HA. You can do the math to figure out exactly what the cost of each GB of SSD capacity, vCPU, Memory, etc. costs for HA placement versus non HA and decide appropriately where to put the VM. I really don't think Hard Drive life is a concern here, you'll pull through the lifespan of the drive easily, or you won't because it's defective which in that case doesn't matter anyways on your decision. So I don't think that's a factor here. It all comes down to math regarding costs vs what is being considered for placement.
-
@Obsolesce said in Windows Failover Clustering... what are your views and why?:
You can do the math to figure out exactly what the cost of each GB of SSD capacity, vCPU, Memory, etc. costs for HA placement versus non HA and decide appropriately where to put the VM.
I did this at my last employer and you'd be surprised what "the business" deems worthy to put on an expensive setup and what they don't when you give them the numbers. My guess is that a lot of things won't end up having HA.
-
The way we have used StarWind VSAN is that we set it up on an Windows Failover Cluster with the VSAN Storage as a CSV and it synced with the nodes minimum of two. Then all the VMs are placed on the CSV unless you want a VM outside that CSV that is not important at all. With the VMS on the CSV you can have one or more servers down and your enviroment will continue to work without interrupting the VMs as long as you have enough Memory and CPU power to have all the VMs on one host or more.
-
@dbeato said in Windows Failover Clustering... what are your views and why?:
The way we have used StarWind VSAN is that we set it up on an Windows Failover Cluster with the VSAN Storage as a CSV and it synced with the nodes minimum of two. Then all the VMs are placed on the CSV unless you want a VM outside that CSV that is not important at all. With the VMS on the CSV you can have one or more servers down and your enviroment will continue to work without interrupting the VMs as long as you have enough Memory and CPU power to have all the VMs on one host or more.
That's what we pretty much have here. What's happening though, is some want to add ALL virtual machines to the CSV, even when they don't need HA. Like you, I want those to not be on CSV.
-
@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:
@scottalanmiller said in Windows Failover Clustering... what are your views and why?:
@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:
@scottalanmiller said in Windows Failover Clustering... what are your views and why?:
@Dashrender said in Windows Failover Clustering... what are your views and why?:
@scottalanmiller said in Windows Failover Clustering... what are your views and why?:
d business isn't always a goal, but it seems like someone has a bee in their bonnet about somethings that it might no
Would you? or would you move it down to a 2 node setup (or the 3 node setup you mentioned above) and use the extra host for something else?
I don't know abotu the compute needs. I'm given the benefit of the double that the CPU and RAM were properly sized.
We can lose 1/3 of the hosts and the remaining VMs have plenty of room to run on the other 2/3 hosts. Tested.
But could you lose 1/2 and keep running, that's the question.
Probably, actually. The HA VM total around 600 GB RAM used. Each server has 768 GB RAM. So, in theory, yes... We could go down to 1/2 hosts and be up. Not sure how the CPU would cope though, and notich to for growth of the HA VM over the next 3 years or so.
Those VM spread over 2/3 have plenty of RAM available, and CPU, plenty of room for growth which is forecast.
If we lost those non important VM on the 3rd host, and dell couldn't get the part to fix the 3rd for say a week or two, we would even have room on the 2/3 that are up to restore the non critical VM from Veeam.
We do back them all up. It's just my position is we lose a lot adding them to the cluster. Technically, we could start the failed VM from Veeam directly using that instant recovery feature.
It's just hard to justify not using all the CSV space 'as it's there'.... When the reason the space is there... Is for expected growth. If we use for these VM we don't really care about... We lose the ability to grow that actually needs HA.
I think it'll be fine. Just trying to get more logical reasoning why adding to the CSV is silly where HA is not a requirement.
The VSAN CSV space is used by the Server already so really there is no other growth to make, basically what you have on the CSV is part of the server and you should take that into account.
Also For @scottalanmiller Starwinds prices the storage total by tiers which is a little bit annoying as well.
-
@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:
@dbeato said in Windows Failover Clustering... what are your views and why?:
The way we have used StarWind VSAN is that we set it up on an Windows Failover Cluster with the VSAN Storage as a CSV and it synced with the nodes minimum of two. Then all the VMs are placed on the CSV unless you want a VM outside that CSV that is not important at all. With the VMS on the CSV you can have one or more servers down and your enviroment will continue to work without interrupting the VMs as long as you have enough Memory and CPU power to have all the VMs on one host or more.
That's what we pretty much have here. What's happening though, is some want to add ALL virtual machines to the CSV, even when they don't need HA. Like you, I want those to not be on CSV.
So then it is a business decision and it should not be an issue. Let them know the pros and cons and that's it
-
@dbeato said in Windows Failover Clustering... what are your views and why?:
@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:
@dbeato said in Windows Failover Clustering... what are your views and why?:
The way we have used StarWind VSAN is that we set it up on an Windows Failover Cluster with the VSAN Storage as a CSV and it synced with the nodes minimum of two. Then all the VMs are placed on the CSV unless you want a VM outside that CSV that is not important at all. With the VMS on the CSV you can have one or more servers down and your enviroment will continue to work without interrupting the VMs as long as you have enough Memory and CPU power to have all the VMs on one host or more.
That's what we pretty much have here. What's happening though, is some want to add ALL virtual machines to the CSV, even when they don't need HA. Like you, I want those to not be on CSV.
So then it is a business decision and it should not be an issue. Let them know the pros and cons and that's it
I agree, this is all business decision, not IT. Business pays for it, business decides if people can alter the budget after the fact. Either way is fine, it's up to who writes the checks.
-
This post is deleted! -
@Obsolesce said in Windows Failover Clustering... what are your views and why?:
The thing is, they are VMs, you can move shit around whenever, to wherever, should the need arise.... and without any downtime if needed. I think each server/service/system should have it's own "SLA" (it's almost midnight can't think of the word now) and should be placed appropriately. Only you can answer whether or not it needs HA. You can do the math to figure out exactly what the cost of each GB of SSD capacity, vCPU, Memory, etc. costs for HA placement versus non HA and decide appropriately where to put the VM. I really don't think Hard Drive life is a concern here, you'll pull through the lifespan of the drive easily, or you won't because it's defective which in that case doesn't matter anyways on your decision. So I don't think that's a factor here. It all comes down to math regarding costs vs what is being considered for placement.
wait a second - this whole box was likely built to be 100% HA - so anything running on it, or planned to be running on it was likely scoped with the expectation of being on HA, wither or not HA was needed - at least that's my expectation.
The VM's being put on it now that the OP is talking about, likely were never originally on the board for this cluster/hardware - where they? I mean - where/why are these VMs a thing now and what was the plan for their placement? Did whomever wanted these VMs get the sign off from the ones that paid for the cluster? (devil's advocate)
-
@Dashrender said in Windows Failover Clustering... what are your views and why?:
@Obsolesce said in Windows Failover Clustering... what are your views and why?:
The thing is, they are VMs, you can move shit around whenever, to wherever, should the need arise.... and without any downtime if needed. I think each server/service/system should have it's own "SLA" (it's almost midnight can't think of the word now) and should be placed appropriately. Only you can answer whether or not it needs HA. You can do the math to figure out exactly what the cost of each GB of SSD capacity, vCPU, Memory, etc. costs for HA placement versus non HA and decide appropriately where to put the VM. I really don't think Hard Drive life is a concern here, you'll pull through the lifespan of the drive easily, or you won't because it's defective which in that case doesn't matter anyways on your decision. So I don't think that's a factor here. It all comes down to math regarding costs vs what is being considered for placement.
wait a second - this whole box was likely built to be 100% HA - so anything running on it, or planned to be running on it was likely scoped with the expectation of being on HA, wither or not HA was needed - at least that's my expectation.
The VM's being put on it now that the OP is talking about, likely were never originally on the board for this cluster/hardware - where they? I mean - where/why are these VMs a thing now and what was the plan for their placement? Did whomever wanted these VMs get the sign off from the ones that paid for the cluster? (devil's advocate)
I think he said that the originally engineering plan was NOT this. It was designed to be non-HA for some or most of the workloads. Super high HA just for select workloads.
-
@scottalanmiller said in Windows Failover Clustering... what are your views and why?:
@Dashrender said in Windows Failover Clustering... what are your views and why?:
@Obsolesce said in Windows Failover Clustering... what are your views and why?:
The thing is, they are VMs, you can move shit around whenever, to wherever, should the need arise.... and without any downtime if needed. I think each server/service/system should have it's own "SLA" (it's almost midnight can't think of the word now) and should be placed appropriately. Only you can answer whether or not it needs HA. You can do the math to figure out exactly what the cost of each GB of SSD capacity, vCPU, Memory, etc. costs for HA placement versus non HA and decide appropriately where to put the VM. I really don't think Hard Drive life is a concern here, you'll pull through the lifespan of the drive easily, or you won't because it's defective which in that case doesn't matter anyways on your decision. So I don't think that's a factor here. It all comes down to math regarding costs vs what is being considered for placement.
wait a second - this whole box was likely built to be 100% HA - so anything running on it, or planned to be running on it was likely scoped with the expectation of being on HA, wither or not HA was needed - at least that's my expectation.
The VM's being put on it now that the OP is talking about, likely were never originally on the board for this cluster/hardware - where they? I mean - where/why are these VMs a thing now and what was the plan for their placement? Did whomever wanted these VMs get the sign off from the ones that paid for the cluster? (devil's advocate)
I think he said that the originally engineering plan was NOT this. It was designed to be non-HA for some or most of the workloads. Super high HA just for select workloads.
Correct. We had to get HA for the subset of our workload. So, this had to be built. We needed less budget to extend the storage, RAM and CPU on these three machines for the non HA compared to having to build entirely separate machines for that workload.
Purely, that was the plan. My mind, it still is. Just some folk are pushing to make all HA. Even, for example, PDQ. We do not need PDQ to be HA. It's a small VM, sure. But it 100% does not need HA. Even if 50GB, we don't need that replicated three times! If it is on a host that does die, if need be, I can start the backup on my veeam box. Once the host is fixed, I can migrate it to the fixed hardware.
Even our webservers don't need HA. Stick one on each host and use HAProxy so when one is down, the we server is taken out of the pool. Sure, make the HAProxy HA, or roll out a HAProxy cluster (I'm sure that would have built in application level HA of some form) but, sure, make that HA if need be.
We have proprietary software on top of windows that isn't made with application level HA. That's why we need the failover clustering. Those VM for sure need to migrate should a physical die. Outside of that, we just don't need it. But for that special case, we do. That's why we have it.
Even domain controllers don't need to be on the CSV. Have one local to each server with the DC in the cluster, but not on shared storage. If a host does die, you still have two DCs online and cam migrate fmso roles (if the holder died). No need to be in the CSV.
But, some folk want all in CSV, and want to waste lots of CSV storage for things that don't need it.
The point of the CSV was that as we increase our proprietary tool over the next few years, there is room to do so. If that space is full of data that doesn't need HA, we have lost the opportunity to use it.
-
Yes, the end solution is to make our application have a form of application level HA, but that's up to development to make, not IT.
-
@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:
The point of the CSV was that as we increase our proprietary tool over the next few years, there is room to do so. If that space is full of data that doesn't need HA, we have lost the opportunity to use it.
I'm in camp 2 on this.
- It is a waste to use the space when it is not needed.
- It was not engineered to be used any other way
- Just because you can do something (option 1) does not mean you should.
- People wanting option 1 are just being lazy. Fuck that.
Do as @scottalanmiller suggested and get the stakeholders to officially sign off on changing the budget expansion cost to be used now instead.
Do not let this just be a conversation. Put it in writing. The lazy people will back down.