Gluster and RAID question
-
@biggen said in Gluster and RAID question:
Lets say as a hypothetical one wanted to build out a 500TB Gluster cluster to be used as a backup target for VMs.
That's the first problem. Bottom line is: you don't.
And by that I don't mean that the tech is wrong. I mean the approach is wrong. Gluster is JUST a filesystem, it's not a target for VMs or anything like that. It's you want to transport fruit from your farm to the market. And you are asking "okay, so I want to use these Goodyear tires, how do I do it?"
You just don't. You look at the job holistically: "How do I store backups of VMs?" Then you answer it at the high level "I store them on an SMB file server!"
Then in the process of answering "How do I host an SMB file server?" you come up with "On a hypervisor."
Eventually you get "under the hood enough" that maybe, MAYBE, the question of "on what storage platform do I run my VMs" the answer becomes "Gluster". But the Gluster piece is not connected at all to the "backup target for VMs".
Just like the tires aren't connected to the fruit. Sure, there is a decent chance that the vehicle that hauls your fruit will use tires, and maybe even Goodyear tires, but it's an under the hood detail that has nothing directly to do with the fact that higher up the chain you are hauling fruit.
Gluster doesn't solve the kind of problem you are trying to solve. So the question doesn't make sense. And it is making you really confused.
-
@biggen said in Gluster and RAID question:
It looks like you need at least 3 nodes to build out the Gluster Cluster. Then, of course, you need an additional node for the hypervisor - so 4 nodes minimum.
No, the hypervisor would never be on a different node. It would almost be on the same cluster as the Gluster storage. If you separate it out, you break the vast majority of the value.
You are trying to use Gluster as if it were a SAN. Gluster can be used underneath a SAN. But a SAN would have no role to play in this kind of setup.
I think you are trying to ask good questions, but are adding so many assumptions by accident that you are floundering.
Start with your goal: "How do I make a backup target for VMs, it likely needs to be 500GB?"
And let's go from there. Absolutely nowhere should Gluster be involved until after loads and loads of other things have been figured out. And then, maybe, Gluster will come into the picture. But if Gluster is an option or not depends on lots of other things.
-
@biggen said in Gluster and RAID question:
So your VMs are running off the Gluster?
Gluster is generally used for that, yes. Because backup storage rarely can leverage the advantages of Gluster, it just doesn't make sense. But for VMs, that's Gluster's bread and butter.
VMs really "never" should be running off of a SAN. That's exactly the least likely option to make sense.
-
@biggen said in Gluster and RAID question:
I know you can buy single boxes that have 2 - 4 nodes inside of them to reduce the footprint.
It reduces the physical size, but limits your storage in horrific ways and is still a lot of nodes that aren't serving a purpose.
That you put a box around four servers doesn't stop them being four servers.
-
@Dashrender said in Gluster and RAID question:
Can Gluster run on the same boxes as the hypervisor like in a hyperconveraged setup?
That's the intended use case. Really, the only intended use case. It's never intended to be used remotely, it has no accommodation for that. You can, of course, by building a SAN on top of it. But how silly is that
Gluster only natively works when the hypervisor is local. Loads of solutions, like Proxmox, have this all baked in.
-
@biggen said in Gluster and RAID question:
On the three Gluster nodes, would you be installing a Linux OS directly to them (bare metal)? I know from reading here physical servers have fallen out of style. Is this a use case where a physical server still serves a purpose?
Physical servers effectively have no purpose. It is extremely safe to say that you won't ever be the exception to that rule. There are exceptions, we should never mention them as they are so rare as to be ridiculous to contemplate in real world discussions. If you see any case and ask "doesn't this make a physical install make sense", that's the time to stop because something must be very wrong because there is no way that that will come up.
In this case, it feels that way because you are misunderstanding hyperconvergence and the storage of VMs and missing the essential "99% majority use case" model. There are only two models that any normal company needs to consider... stand alone boxes, and hyperconvergence. That's it. Don't even worry about anything else, it's so insanely rare that it's not reasonable to think about and if you get it wrong failing to SA/HC are "safe" mistakes, but skipping them and doing something else when you shouldn't is generally insanely expensive, complex, and risky.
One of the biggest mistakes nearly every IT person makes (and people in general, check the movie "He's Just Not That Into You"... is believing that they are the one exception to the rules. Everyone thinks it, but no one is. You aren't the exception, you are the rule (direct quote from the movie.) Same for me. I'm no exception. Same for everyone here. We all feel like we should be the exception, but none of us are. We are all the rule. Not on the edge, either. We are all very much exactly the rule.
-
@Dashrender said in Gluster and RAID question:
Now I'm guessing this can't be done with Hyper-V, since that can't run inside Linux (as far as I know)
No T1 hypervisor can, by definition. T1 Hypervisors have to run on the bare metal, it's part of the definition.
Hyper-V needs a different product that is compatible with it to replace Gluster. This is why you define the pieces higher up before you even talk about the storage tech. Because the hypervisor determines that storage. Starting with Gluster is the cart driving the horse.
-
@Pete-S said in Gluster and RAID question:
Yes, for instance the 2U 4-node servers from Supermicro.
Each node has 6 hot swap bays, dual CPUs, PCIe slot etc. So 4 complete servers in one.Problem there is, that's a very limited amount of storage. All that compute power, and almost no storage. But what he actually needs is the opposite.
-
@biggen so here is what I'm thinking....
Goal: 500TB of storage for backups.
Proposed solution: Use MinIO in LXC or Docker
Looks absolutely nothing like what you were thinking, but approaches the problem from a "how do we solve the problem" perspective. Rather than "I have this technology, what problem might it solve."
Why use Samba for backup storage when you could use distributed object storage?
-
@scottalanmiller said in Gluster and RAID question:
@Pete-S said in Gluster and RAID question:
Yes, for instance the 2U 4-node servers from Supermicro.
Each node has 6 hot swap bays, dual CPUs, PCIe slot etc. So 4 complete servers in one.Problem there is, that's a very limited amount of storage. All that compute power, and almost no storage. But what he actually needs is the opposite.
In this case yes, but you could for instance use the exact same model with 3.5" drives instead and get 3x16=48TB per node. Or if that is not enough, just go with hooking up external disc enclosures to those nodes that need lot's of storage - assuming that a cluster of nodes might be doing more than just backup.
For instance the 847 from Supermicro is pretty common. Gives you 44x3.5" hotswap bays in 4U. So 44x16=704TB raw storage.
You can use one enclosure to provide disk space for 4 nodes if you wanted. Hundred different options of course. You could also do three 2U servers, each with 16x3.5" bays giving you 256TB per server. It all depends on how high density and how flexible you need it to be.
-
@Pete-S said in Gluster and RAID question:
You can use one enclosure to provide disk space for 4 nodes if you wanted. Hundred different options of course. It also depends on how high density and how flexible you need it to be.
The thing is, you only need one node. There's no purpose to the other three nodes
The enclosures with all the drives are great and make sense. The additional compute nodes don't, they aren't really serving any purpose at all here. This is really either a single node standalone system. Or it is a scale out system. Regardless of which approach you take, multiple nodes in one enclosure don't make sense for this kind of use case.
-
I think its safe to say I'll probably never be asked (or even want to) design a system where Gluster may be used as it seems its waaaay over my head at this stage of the game.
I don't even know what @scottalanmiller solutions are!
-
@scottalanmiller said in Gluster and RAID question:
@Pete-S said in Gluster and RAID question:
You can use one enclosure to provide disk space for 4 nodes if you wanted. Hundred different options of course. It also depends on how high density and how flexible you need it to be.
The thing is, you only need one node. There's no purpose to the other three nodes
The enclosures with all the drives are great and make sense. The additional compute nodes don't, they aren't really serving any purpose at all here. This is really either a single node standalone system. Or it is a scale out system. Regardless of which approach you take, multiple nodes in one enclosure don't make sense for this kind of use case.
Yes, the multi-node servers just makes sense when you need multiple servers
It's actually cheaper both running them and buying them that way, compared to individual servers with the same specs.
-
@biggen said in Gluster and RAID question:
I think its safe to say I'll probably never be asked (or even want to) design a system where Gluster may be used as it seems its waaaay over my head at this stage of the game.
I don't even know what @scottalanmiller solutions are!
You don't know @scottalanmiller solutions? Then I wonder how you wondered in Gluster in the first place? It's so far beyond his solutions, it's almost crazy to even consider Gluster when considering the solutions Scott spoke of.
-
@Pete-S said in Gluster and RAID question:
@scottalanmiller said in Gluster and RAID question:
@Pete-S said in Gluster and RAID question:
You can use one enclosure to provide disk space for 4 nodes if you wanted. Hundred different options of course. It also depends on how high density and how flexible you need it to be.
The thing is, you only need one node. There's no purpose to the other three nodes
The enclosures with all the drives are great and make sense. The additional compute nodes don't, they aren't really serving any purpose at all here. This is really either a single node standalone system. Or it is a scale out system. Regardless of which approach you take, multiple nodes in one enclosure don't make sense for this kind of use case.
Yes, the multi-node servers just makes sense when you need multiple servers
It's actually cheaper both running them and buying them that way, compared to individual servers with the same specs.
Is it? I suppose maybe - if you need a ton of compute and almost no storage, sure, but that rarely seems to be the case.
Considering the density of VMs you can get on a dual socket VM Host these days, compute is normally the last resource to run out - RAM or storage are much more likely. -
@Dashrender said in Gluster and RAID question:
@Pete-S said in Gluster and RAID question:
@scottalanmiller said in Gluster and RAID question:
@Pete-S said in Gluster and RAID question:
You can use one enclosure to provide disk space for 4 nodes if you wanted. Hundred different options of course. It also depends on how high density and how flexible you need it to be.
The thing is, you only need one node. There's no purpose to the other three nodes
The enclosures with all the drives are great and make sense. The additional compute nodes don't, they aren't really serving any purpose at all here. This is really either a single node standalone system. Or it is a scale out system. Regardless of which approach you take, multiple nodes in one enclosure don't make sense for this kind of use case.
Yes, the multi-node servers just makes sense when you need multiple servers
It's actually cheaper both running them and buying them that way, compared to individual servers with the same specs.
Is it? I suppose maybe - if you need a ton of compute and almost no storage, sure, but that rarely seems to be the case.
Considering the density of VMs you can get on a dual socket VM Host these days, compute is normally the last resource to run out - RAM or storage are much more likely.Well, you have to realize that storage density has increased and multi-node servers come in a lot of different models. The ratio between compute and storage entirely depends on which model you pick.
For instance this one with 4 nodes:
If we spec' it with readily available components today each node would have:
2x28 core Intel Platinum CPUs, 3TB RAM, 128GB SSD boot, 4TB NVMe SSD storage, 48TB HDD storage, 100 Gigabit ethernet.Isn't 48TB per node enough storage?
-
@Pete-S said in Gluster and RAID question:
@Dashrender said in Gluster and RAID question:
@Pete-S said in Gluster and RAID question:
@scottalanmiller said in Gluster and RAID question:
@Pete-S said in Gluster and RAID question:
You can use one enclosure to provide disk space for 4 nodes if you wanted. Hundred different options of course. It also depends on how high density and how flexible you need it to be.
The thing is, you only need one node. There's no purpose to the other three nodes
The enclosures with all the drives are great and make sense. The additional compute nodes don't, they aren't really serving any purpose at all here. This is really either a single node standalone system. Or it is a scale out system. Regardless of which approach you take, multiple nodes in one enclosure don't make sense for this kind of use case.
Yes, the multi-node servers just makes sense when you need multiple servers
It's actually cheaper both running them and buying them that way, compared to individual servers with the same specs.
Is it? I suppose maybe - if you need a ton of compute and almost no storage, sure, but that rarely seems to be the case.
Considering the density of VMs you can get on a dual socket VM Host these days, compute is normally the last resource to run out - RAM or storage are much more likely.Well, you have to realize that storage density has increased and multi-node servers come in a lot of different models. The ratio between compute and storage entirely depends on which model you pick.
For instance this one with 4 nodes:
If we spec' it with readily available components today each node would have:
2x28 core Intel Platinum CPUs, 3TB RAM, 128GB SSD boot, 4TB NVMe SSD storage, 48TB HDD storage, 100 Gigabit ethernet.Isn't 48TB per node enough storage?
Still no more storage than a standard 2u server, without the SPOF "feature". If you need 8 64cpu/128 thread EPYC CPU in a single 2u chassis is the only time these things make any sense. And seriously, who needs more than 256 threads per 2u of rack space?
Yes, I know the workloads exist, but they're not common!
-
@biggen said in Gluster and RAID question:
I don't even know what @scottalanmiller solutions are!
I'm not sure that I do, either. LOL
-
@biggen said in Gluster and RAID question:
I think its safe to say I'll probably never be asked (or even want to) design a system where Gluster may be used as it seems its waaaay over my head at this stage of the game.
Not necessarily, but it's not as likely as you were thinking I think.
You might easily be in a situation where you need a huge virtualization cluster that's, other than being big, relatively low performance and pretty simple. Gluster might be just perfect for that. And something like Proxmox will potentially do that automatically for you making it that much simpler.
Gluster is a great tool with lots of applicability. But you will likely approach it from a different perspective.
-
@travisdh1 said in Gluster and RAID question:
Still no more storage than a standard 2u server, without the SPOF "feature".
It's a neat box, and I like them a lot. But they do retain a light SPOF of the chassis itself. When people talk about the "forklift failure", for example, this doesn't protect against it. Or water leaks. Things of that nature.
Sure, two boxes directly next to each other help very little, the idea of a cluster is that the nodes should be at least in adjacent racks, not adjacent rack slots, but.... it's something.