Xenserver and Storage
-
That's a lot of interesting technical stuff, but it's not like that when you build a product Let me explain.
Market context
Here, we are in the hyperconvergence world. In this world, users want some advantages against traditional model (storage separated from compute). So, the first question you need to answer before building a solution, is "what users want?". So, we made some research and found that in general, people want, in decreasing order:
- Cost/ROI (in short, simpler infrastructure to manage will reduce cost)
- HA features
- Ease of scaling and correct level of performances (in short: they don't want to be blocked and have worst performances than with existing solutions, or that perfs are too bad compared to using advantages of costs/features/flexibility/security).
These "priorities" came from our studies but also from Gartner.
Technical context
Then, we are addressing XenServer world. When you have hyperconverged solution there, you have a some "limitations":
- a shared SR can only be used within a pool (so 1 to 16 hosts max)
- more you modify the Dom0, worst it is
So with this context, you won't scale more than 16 hosts MAX.
Real life usage
So we decided to take a look with some benchmarks, and despite choosing in priority something safe/flexible, we had pretty nice performances, as you can see in our multiple benchmarks.
In short: performances are correct. If it wasn't, we would have stopped the project (or switched to another technology).
Regarding the "cluster goes boom", no: it goes in RO for your VMs, so it won't erase/corrupt your data.
-
@jrc said in Xenserver and Storage:
So currently I have 2 HP servers that are being used and XenServer hosts. The shared storage is on an HP MSA1040 SAN, connected via 8Gb/s Fiber.
The servers have worked flawlessly since I got them, not a single issue and have only been re-booted for updates and upgrades. I cannot say the same for the SAN. It has gone done about 4 or 5 times, and these outages have highlighted the fragility of my setup.
The HP servers have 24 2.5" drive bays. So I am contemplating filling them with drives and moving away from the SAN, but in order to that I would need the space to be shared between the two hosts.
How can I do that? What would that look like? What kind of cost would it be (outside of buying the drives) and is it a good idea?
Someone mentioned VSAN to me while I was talking about this, but I am not that clued up about VSANs and how they work or how they are put together.
Any advice on this would be greatly appreciated. But please don't lecture me on how bad a SAN is, and that my setup is doomed or that I am an idiot for doing it this way. I am looking for a path forward and not a beratement for things that have long since passed.
You can mount some NVMe performers and some high-capacity spindles into your Xen hosts to run VMs from and leave aging HP SAN for backup purpose only. Two hosts with a SAN make zero sense really...
-
Having local storage is good for perfs, but you can't live migrate without moving the disks or HA on the other host.
I did a recap on local vs (non hyperconverged) shared storage in XS:
-
Note, XOSAN is just Gluster under the hood. You do NOT WANT TO RUN GLUSTSER WITH 2 nodes. IT IS NOT SUPPORTED. (you can run a 3rd metadata only node, but you need SOMETHING out there to provide quorum).
It requires a proper stateful quorum of a 3rd node. Also for maintenance, you really likely want 4 nodes at a minimum so you can do patching and still take a failure. You'll also need to consider having enough free capacity on the cluster to maintain health slack on the Bricks, (20-30%) AND take a failure, so do that math into your overhead. Also for reasons, I'll get into in a moment you REALLY want to run local raid on Gluster nodes.
Also note, Gluster's local drive failure handling is very... binary... RedHat (who owns Gluster) refuses to issue a general support statement for JBOD mode with their HCI product, and directs you to use RAID 6 for 7.2K drives (no RAID 10). Given the unpredictable latency issues with SSD's (Garbage collection triggering failure detection etc) their deployment guide completely skips SSDs (as I would expect until they can fix the failure detection code to be more dynamic, or they can build a HCL). JBOD because of these risks is a "Contact your Red Hat representative for details." (Code for we think this is a bad idea, but might do a narrowly tested RPQ type process).
Gluster one node performance is very... encouraging You definitely need more nodes for a reasonable performance (even with some NVMe back end).
That's another story... Software Defined Storage on top of the hardware RAID isn't something many companies do for a good reason (we do, but we're sliding away from that approach for anything beyond 2 or maybe 3 node configurations). You want raw device access (better even with firmware bypassed) or... nobody will guarantee just confirmed write had actually been 100% completed.
-
@olivier said in Xenserver and Storage:
Having local storage is good for perfs, but you can't live migrate without moving the disks or HA on the other host.
I did a recap on local vs (non hyperconverged) shared storage in XS:
Most of the "budget" SMB customers shouldn't care about that.
-
@kooler said in Xenserver and Storage:
@olivier said in Xenserver and Storage:
Having local storage is good for perfs, but you can't live migrate without moving the disks or HA on the other host.
I did a recap on local vs (non hyperconverged) shared storage in XS:
Most of the "budget" SMB customers shouldn't care about that.
This is not my point of view. Eg even for my small production setup, hosted in a DC, it's not obvious to migrate big VMs on local SR from a host to another to avoid service interruption.
edit: I'm using XOSAN for my own production setup, best way to sell a product
-
@olivier said in Xenserver and Storage:
Having local storage is good for perfs, but you can't live migrate without moving the disks or HA on the other host.
I did a recap on local vs (non hyperconverged) shared storage in XS:
That's not really a sensible statement. You can't live migrate the STORAGE of the VMs without moving the storage. If you want to move your VMs without moving storage, you stay in the same boat as with any external storage. If you need to move the storage live with external storage, you have the same issues.
You have to treat the two things differently to give any advantage to external storage on dedicated hardware. Literally, anything that looks like an advantage is always expecting it to "deliver less" than the local disks and therefore not asking as much of it. Like we expect the local disks to live migrate, but completely ignore asking the external storage to do that.
How does security improve by having more points to attack?
-
@olivier said in Xenserver and Storage:
@kooler said in Xenserver and Storage:
@olivier said in Xenserver and Storage:
Having local storage is good for perfs, but you can't live migrate without moving the disks or HA on the other host.
I did a recap on local vs (non hyperconverged) shared storage in XS:
Most of the "budget" SMB customers shouldn't care about that.
This is not my point of view. Eg even for my small production setup, hosted in a DC, it's not obvious to migrate big VMs on local SR from a host to another to avoid service interruption.
edit: I'm using XOSAN for my own production setup, best way to sell a product
SMBs should not be worried, in 99% of cases, about migrating VMs around. That's not an SMB need.
-
I consider myself as a SMB (3 sockets!) and I need live migration, that's really something useful. That's also used a LOT by our customers. Maybe a XenServer users bias. But it's real there.
-
@olivier said in Xenserver and Storage:
I consider myself as a SMB (3 sockets!) and I need live migration, that's really something useful. That's also used a LOT by our customers. Maybe a XenServer users bias. But it's real there.
Used by, and should be used by are not the same things. SMBs are famous for wasting money where it is pointless, doing complex things because it makes them feel good, and not spending more (or effort) where it actually matters. Why would SMBs need to live migrate services around?
-
Because it allows an abstraction of the hardware, for replacing/patching/rebooting stuff without even lose service (or to avoid to do so a week end for instance)
-
@olivier said in Xenserver and Storage:
Because it allows an abstraction of the hardware, for replacing/patching/rebooting stuff without even lose service (or to avoid to do so a week end for instance)
Right, that's a thing SMBs don't need. People sell them that, but I'd be pretty pissed if I found my techs spending money on that. It is only useful for patching the underlying hypervisors. How long does that take? And you have to have longer outages for patching the individual VMs on top already. So just align the patching time. SMBs don't have many workloads and rarely critical ones. Every SMB hates being an SMB and claims big money losses or high criticality for services, but when it comes down to it, it's all bluster 99.99% of the time.
There are cases where this matters, but good luck actually finding one. It's simple a need that the SMB doesn't have in reality. Patching is a trivial process easily scheduled. There is a reason that even the Wall St. banks don't need to do this for their biggest workloads. It's an almost completely fabricated business need.
-
For any shop that actually needs this functionality, you normally need it higher in the stack, at the application level. So when needed, you already have it and don't need the platform to provide it across the board. So in most of the rare cases where the need does exist, you already have the capability.
-
So I should be an exception then
edit: in the end, your perception doesn't really matter if the "market" think otherwise.
-
@olivier said in Xenserver and Storage:
So I should be an exception then
What business need creates it for you? What service do you run that is so critical that you have no greenzones all week long?
And then isn't properly mitigated through application level high availability?
-
Eg XS patching for critical sec reasons, I don't have the resources to make our apps redundant at their level, so I rely on virt (and live mig) to avoid outage.
-
@olivier said in Xenserver and Storage:
Eg XS patching for critical sec reasons, I don't have the resources to make our apps redundant at their level, so I rely on virt (and live mig) to avoid outage.
Sure, but what service is so critical that you can't reboot? SMBs basically never have any service that needs to stay up. That's the thing. I get why services will go down without an HA solution in place, but what no one ever explains to me is why going down is a problem. How many users are impacted and in what way and for how long?
-
Think priorities. It will impact some users (because the updater in XOA), not dramatic for the business but it's better to avoid that. So the cost to have it is negligible (already using virt). And I don't have the resource to make the service app HA (because live migration is freeβ¦)
edit: in the end, if I follow your arguments, virtualization is also useless for SMBs.
-
@olivier said in Xenserver and Storage:
edit: in the end, if I follow your arguments, virtualization is also useless for SMBs.
Nope, not in the least. This would imply a misunderstanding of the purpose of virtualization. Virtualization is free and makes things safer. Everyone benefits from virtualization, every time.
HA is not free, adds its own risks (that are very high) and provides uncommon benefits. Most shops are hurt by HA, not helped by it.
Same logic, totally different results.
-
I'm not speaking about HA right now, I'm speaking about live migration
HA is another beast, I agree it should be used only after thinking benefits/problems.