Gluster and RAID question



  • Trying to expand my knowledge a bit on clustered file systems. I've never had any experience working with them so I figured spinning up a couple test VMs would be a good starting point.

    Anyway, one of the first question that popped into my head is how does RAID fit into the picture with Gluster? I understand that Gluster has three different types of architectures: Distributed (not-redundant), Replicated (redundant), and Distributed Replicated (an amalgamation of the two).

    Am I understanding correctly that the Distributed type is similar to JBOD? Wouldn't RAID be necessitated under this type of architecture if one cared about resiliency? I'm trying to understand why one would use Distributed? Would its use case be for utilizing the maximum amount of underlying storage available similar to JBOD? Are Distributed Gluster deployments typically in Production?

    Secondly, the Replicated architecture type seems to be quite interesting. Am I correct in assuming that RAID wouldn't be required in this scenario since we are essentially creating copies of the same data on every brick in the pool? To me this sounds essentially like a RAID mirror which seems pretty neat. Can the GlusterFS be built upon LVM? If so, adding storage to each node in order to expand the bricks at a later time would be pretty easy.

    Finally, from doing some reading, it appears Gluster isn't known for its speed. Of course a lot of that depends on the network infrastructure its built on but I don't see how it could ever be as fast as local disks. What is the typically use case for something like Gluster? Massive storage requirements?



  • @biggen said in Gluster and RAID question:

    Anyway, one of the first question that popped into my head is how does RAID fit into the picture with Gluster?

    Gluster is a layer, RAID is a layer. You can make RAID on Gluster, you can make Gluster on RAID. But you shouldn't, the idea is that Gluster is instead of RAID. If you want RAID, you don't want Gluster, and vice versa.



  • @biggen said in Gluster and RAID question:

    Am I understanding correctly that the Distributed type is similar to JBOD?

    No, it's the polar opposite. As opposite as it can be. JBOD means the disks have no association with each other, Distributed means all disks are associated.

    Youtube Video



  • @biggen said in Gluster and RAID question:

    Wouldn't RAID be necessitated under this type of architecture if one cared about resiliency?

    No, the purpose of Gluster is to provide a level of resilience that traditional RAID cannot do. RAID has no necessary role here.



  • @biggen said in Gluster and RAID question:

    Would its use case be for utilizing the maximum amount of underlying storage available similar to JBOD?

    JBOD does not have a purpose like that.



  • I guess I've been confusing the concept of JBOD with RAID 0 for the last 20 years then...



  • Gluster is a RAIN technology, the logical replacement to RAID. It's easier if you think at the same levels.

    RAIN and RAID are basic approaches to managing groups of disks. JBOD is when you lack RAID, RAIN, LVM, etc. You can never have JBOD when you have anything else, JBOD means a lack of all disk management.

    RAID is designed to a simple approach to single node storage that simply views storage at the logical disk level. RAIN is designed as a complex approach to multi-node storage that views storage at a lower level so that it is not limited to logical disk abstractions.

    RAIN and RAID provide the same concepts of redundancy, resiliency, scalability, performance. But do so using different mechansims.

    Gluster is one RAIN implementation, but is just RAIN. So thinking in the big, general sense will make things clearer.



  • @biggen said in Gluster and RAID question:

    I guess I'm been confusing the concept of JBOD with RAID 0 for the last 20 years then...

    Oh yeah, very different. RAID 0 is still RAID. JBOD can't do anything that RAID 0 does, because each disk is completely independent.



  • @biggen said in Gluster and RAID question:

    Are Distributed Gluster deployments typically in Production?

    Yes, that's where they get used.



  • @biggen said in Gluster and RAID question:

    Am I correct in assuming that RAID wouldn't be required in this scenario since we are essentially creating copies of the same data on every brick in the pool?

    Correct. Because RAIN does the same basic things as RAID, just at a brick level rather than at a disk level. That's all. RAIN is more flexible, but takes more underlying technology to implement because of this.



  • @biggen said in Gluster and RAID question:

    Can the GlusterFS be built upon LVM? If so, adding storage to each node in order to expand the bricks at a later time would be pretty easy.

    Can it? Yes. But there's no reason to. LVM doesn't provide any additional functionality here. Gluster can add bricks already without needing LVM for it. LVM would be to grow what is already there. But Gluster has no need for that.



  • Ok, yeah I don't know why I've been thinking JBOD as RAID 0. Thanks for setting me straight.

    How can Distributed Gluster provide any fault tolerance if data isn't replicated across bricks (nodes)?



  • @biggen said in Gluster and RAID question:

    Finally, from doing some reading, it appears Gluster isn't known for its speed. Of course a lot of that depends on the network infrastructure its built on but I don't see how it could ever be as fast as local disks. What is the typically use case for something like Gluster? Massive storage requirements?

    Local disks are always fastest, that's just physics. You can't beat that. Generally RAID beats RAIN on performance, because of the simplicity.

    Blinding speed is rarely a priority today, it sounds good but it's one of those myths. Everyone says they need all this speed, no one can actually tell you why they do.

    Gluster is typically used for normal use case clusters where you want to build low cost, large scale anything. From standard virtualization, to giant file servers.



  • @biggen said in Gluster and RAID question:

    How can Distributed Gluster provide any fault tolerance if data isn't replicated across bricks (nodes)?

    Why would there be something not replicated across the bricks?

    RAID can't provide fault tolerance when data isn't written. RAIN can't either. That's why everything gets replicated.



  • @scottalanmiller said in Gluster and RAID question:

    @biggen said in Gluster and RAID question:

    How can Distributed Gluster provide any fault tolerance if data isn't replicated across bricks (nodes)?

    Why would there be something not replicated across the bricks?

    RAID can't provide fault tolerance when data isn't written. RAIN can't either. That's why everything gets replicated.

    https://docs.gluster.org/en/latest/Quick-Start-Guide/Architecture/

    Distributed doesn't appear to replicate across bricks. It "distributes" files across bricks variously.



  • @biggen said in Gluster and RAID question:

    @scottalanmiller said in Gluster and RAID question:

    @biggen said in Gluster and RAID question:

    How can Distributed Gluster provide any fault tolerance if data isn't replicated across bricks (nodes)?

    Why would there be something not replicated across the bricks?

    RAID can't provide fault tolerance when data isn't written. RAIN can't either. That's why everything gets replicated.

    https://docs.gluster.org/en/latest/Quick-Start-Guide/Architecture/

    Distributed doesn't appear to replicate across bricks. It "distributes" files across bricks variously.

    Oh sorry, you mean the Gluster term distributed. There is no redundancy. Same as RAID 0.

    Just like with RAID, RAIN has features like redundancy as an option, but not a requirement. There is nothing about any technology like this that automatically implies any safety.



  • @biggen said in Gluster and RAID question:

    How can Distributed Gluster provide any fault tolerance if data isn't replicated across bricks (nodes)?

    The simple answer is, it doesn't. The link that you said has this in bold: "Hence there is no data redundancy."



  • @scottalanmiller No problem. So I'm guessing if one really wanted to use the "distributed" type than RAID would really need to be required if you wanted redundancy. I think I'm wrapping my head around this now.



  • @biggen said in Gluster and RAID question:

    @scottalanmiller No problem. So I'm guessing if one really wanted to use the "distributed" type than RAID would really need to be required if you wanted redundancy. I think I'm wrapping my head around this now.

    I think you are thinking about this all wrong.

    First, you never use RAIN and RAID together. So anything that's making you think of using RAID with Gluster means you are thinking about it fundamentally wrong. It's not that it's physically impossible, but that it makes no sense.

    Second, you never choose distributed if you want redundancy. So never would there be a case where you'd have the distributed type AND want redundancy. You'd choose the redundancy option instead.



  • Using RAID to provide the resilience for a RAIN system would be like buying a Ferrari but then deciding to have a Ford tow it around instead of driving the Ferrari that you already paid for. It'll work, but it won't work as well and it makes having bought the Ferrari make no sense.

    Gluster can do resiliency so much more advanced than RAID can. That's the primary reason you'd be looking at it. Why would you want Gluster, but then not want to use it?

    RAID can't do a fraction of what RAIN can do. So in this case you'd keep all of the performance impact of RAIN, and the resilience of the RAID would not be nearly as good as what RAID could do alone, nor anything like what RAIN can do. It would be a very crappy level of resiliency that no one would be okay with.



  • @scottalanmiller said in Gluster and RAID question:

    @biggen said in Gluster and RAID question:

    @scottalanmiller No problem. So I'm guessing if one really wanted to use the "distributed" type than RAID would really need to be required if you wanted redundancy. I think I'm wrapping my head around this now.

    I think you are thinking about this all wrong.

    First, you never use RAIN and RAID together. So anything that's making you think of using RAID with Gluster means you are thinking about it fundamentally wrong. It's not that it's physically impossible, but that it makes no sense.

    Second, you never choose distributed if you want redundancy. So never would there be a case where you'd have the distributed type AND want redundancy. You'd choose the redundancy option instead.

    So this takes me all the way back to my OP:

    Are Distributed Gluster deployments typically in Production?

    I guess if one didn't care about redundancy that would be the only use case for that specific architecture. Because the only way to provide it would be with RAID, and you say that running RAID under RAIN isn't the way to ever run RAIN to begin with. So using the "distributed" type of Gluster with RAID to provide redundancy would be a poor choice to ever use with like I was thinking.



  • @biggen said in Gluster and RAID question:

    I guess if one didn't care about redundancy that would be the only use case.

    Same as with RAID 0. You only skip redundancy when you have no need for it. But there are plenty of cases where there is no need for it.



  • Ok. Great thanks Scott. Gives me something to think about. I think I'll play around with a couple VMs today using Gluster and see how it goes.

    I have no use case for it. But i figure just experimenting with it for a bit can't hurt.



  • @biggen said in Gluster and RAID question:

    I have no use case for it. But i figure just experimenting with it for a bit can't hurt.

    It's cool tech, for sure.



  • @biggen said in Gluster and RAID question:

    @scottalanmiller said in Gluster and RAID question:

    @biggen said in Gluster and RAID question:

    @scottalanmiller No problem. So I'm guessing if one really wanted to use the "distributed" type than RAID would really need to be required if you wanted redundancy. I think I'm wrapping my head around this now.

    I think you are thinking about this all wrong.

    First, you never use RAIN and RAID together. So anything that's making you think of using RAID with Gluster means you are thinking about it fundamentally wrong. It's not that it's physically impossible, but that it makes no sense.

    Second, you never choose distributed if you want redundancy. So never would there be a case where you'd have the distributed type AND want redundancy. You'd choose the redundancy option instead.

    So this takes me all the way back to my OP:

    Are Distributed Gluster deployments typically in Production?

    I guess if one didn't care about redundancy that would be the only use case for that specific architecture. Because the only way to provide it would be with RAID, and you say that running RAID under RAIN isn't the way to ever run RAIN to begin with. So using the "distributed" type of Gluster with RAID to provide redundancy would be a poor choice to ever use with like I was thinking.

    You can do distributed and replicated for a volume. It's not just one or the other.



  • @stacksofplates said in Gluster and RAID question:

    You can do distributed and replicated for a volume. It's not just one or the other.

    They call the options Distributed, Replicated, and Distributed Replicated.

    It's a bit like having RAID 0, RAID 1, and RAID 10.



  • Played around with it a bit today. Sharing it out via SAMBA seems a little complicated since you also need to layer CTBD. Is that the standard way to share it out to Windows clients?



  • @biggen said in Gluster and RAID question:

    Sharing it out via SAMBA seems a little complicated since you also need to layer CTBD.

    Why do you need that?



  • @biggen said in Gluster and RAID question:

    Is that the standard way to share it out to Windows clients?

    No, that would not be common. The common way is to have Samba in a VM that uses Gluster as a backing share.



  • @scottalanmiller Ok, it seems most of the tutorials show it being done with CTBD. I’ve found a couple that just create a standard samba share and export it. I’ll play with that route.

    So would samba be installed on each node and then shared out? To which samba node do the clients connect to?


Log in to reply