ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Announcing the Death of RAID

    Scheduled Pinned Locked Moved IT Discussion
    raidrainstorage
    53 Posts 11 Posters 8.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scottalanmillerS
      scottalanmiller @Net Runner
      last edited by

      @Net-Runner said in Announcing the Death of RAID:

      I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).

      Network RAID on top of local RAID has huge disadvantages, in performance, reliability and overhead (in many cases.) Overhead is totally unique to each implementation, but here is an example of overhead issues...

      If you have a three node cluster using local RAID on each node, have 24TB on each node and network RAID to connect them, you have some touch choices for your RAID.

      If you use RAID 0 on each node then you need to fully rebuild each 24TB data set, over the network, in total before the node is restored. That could easily bring your cluster down from overhead alone just from losing a single drive and might leave you waiting days or weeks for the cluster to be really usable again, during which time your risk gets super high, any node losing a single disk means the entire node is lost. So the risks get huge. If you do mirroring, the only reasonably choice there, this is RAID 01 and not nearly as safe as RAID 10. So we'd be looking at a system that is insanely risky compared to just a normal, local RAID array.

      1 Reply Last reply Reply Quote 0
      • scottalanmillerS
        scottalanmiller
        last edited by

        If we took the same example but moved to RAID 5 for the local disks, we are still so risky that we are roughly equal in risk to using the RAID 0 locally. This is RAID 51. This consumers more than 70% of your disks for parity or mirroring while still being too risky to consider. If you moved to expensive enterprise drives, it might approach "safe-ish", but it's pretty crazy risky and then pretty expensive. If you were using 9x 4TB arrays on each node, you would be giving up 19 out of 27 drives in your cluster and getting something that isn't all that fast and is not very safe. The example before was giving up 16 out of 24 drives, a little better, but not much. The loss number is huge, it should provide a lot of protection if you are giving up so much capacity.

        1 Reply Last reply Reply Quote 1
        • scottalanmillerS
          scottalanmiller
          last edited by

          To make this work at all, RAID 61 is the riskiest level that we can really consider. With RAID 61 we need to give up 22 out of 30 drives and we still will need to consider the unbelievable impact to one of the nodes that might happen if we lose a disk and need to rebuild. We might lose 80-99% of our storage capabilities on that one node while the RAID 6 repairs itself. Losing a node entirely would become unlikely, but even without losing a node our impacts might be really big. But even if we can absorb a lengthy, intensive rebuild, the storage cost is very large. We would likely use hardware RAID at a cost of some $2100 in hardware for that, plus incredibly low utilization possibilities on the disks.

          1 Reply Last reply Reply Quote 0
          • scottalanmillerS
            scottalanmiller
            last edited by

            To get truly safe and fast, we'd need RAID 101. This method would require 40 out of 48 drives to be lost to mirroring operations. This would protect us from the intensive rebuilds and mean that individual nodes are essentially never lost (at the storage level, anyway) making the system reasonably safe, but the cost just keeps escalating.

            1 Reply Last reply Reply Quote 1
            • scottalanmillerS
              scottalanmiller
              last edited by

              Now we could, in theory, not use mirroring across nodes but use parity instead. This can reduce the amount of storage that we need to purchase to make the system work, but it comes at a staggering cost to performance and risk. Imagine something like RAID 5 working over a network. A network based node reconstruction could be very, very bad.

              1 Reply Last reply Reply Quote 0
              • scottalanmillerS
                scottalanmiller
                last edited by

                We also have to remember that in a network RAID model we carry a risk of node failure from something other than storage. Loss of a CPU, motherboard, fans, memory or whatever would cause an entire node to fail. In a local RAID scenario this is not a huge deal since our storage is intact, we simply replace the failed part and the server comes back online. Not so with network RAID. If we have a bed memory stick and a node goes down, when it comes back online, no matter how intact its local RAID is, the array itself has failed and has to be reconstructed as if it were new. The data stored on it is useless. So activities that would not normally affect storage reliability in a single node view of RAID become devastating to network RAID. This is why network RAID rarely is used beyond two nodes and generally only mirroring. Accidentally reboot two nodes in network RAID 5 before all rebuilds are complete and... all is lost.

                1 Reply Last reply Reply Quote 0
                • scottalanmillerS
                  scottalanmiller
                  last edited by

                  When we use network RAID, a node is useless (from a storage perspective) until it is fully restored which can take an incredibly long time. Even 10GigE dedicated storage links is a little slower than 12Gb/s SAS links. And there is a lot of overhead to be considered.

                  RAIN has other advantages, though. RAIN is both node and drive aware. So RAID has many options. Of course, different implementations will do different things, but one of the most common approaches is to use block mirroring. Using block mirroring a cluster like the one that we describe needs to only rebuild a single drive if one fails, not restore an entire node. Once the drive is replaced, only that one drive must change. And because the restore is at the block level, not the drive level, individual restored blocks are protected and able to be used for their IOPS the moment that they are restored, so the drive begins to be marginally useful in seconds and scales up to full utilitzation quickly.

                  Some systems, like several that we use, also do automatically rebalancing should a drive fail. In our example 24TB three node from above, if this were a reasonably common RAIN implementation and the full array was not already at maximum capacity then the "at risk" blocks from the lost drive would automatically be duplicated elsewhere in the RAIN cluster to restore part or all of the protection that was there before and to minimize performance impact until the disk is restored. This means that in the time that it takes for a human to run to a server and replace a failed drive with a new one, even if just fifteen minutes, the data that had been on that drive might already be fully protected again, in most cases! Something that RAID cannot do at all.

                  1 Reply Last reply Reply Quote 0
                  • scottalanmillerS
                    scottalanmiller @Dashrender
                    last edited by

                    @Dashrender said in Announcing the Death of RAID:

                    @Net-Runner said in Announcing the Death of RAID:

                    I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).

                    I wonder if this flies in the face of what @scottalanmiller has been saying that hardware RAID isn't needed for performance reasons?

                    Well it's incorrect. Good RAIN can dramatically outperform RAID at a fraction of the resources. And RAIN vs RAID is not software vs hardware. Both come in both. And just like how RAID 7 uses a huge amount of resources and RAID 10 uses almost none, some RAIN uses a ton of resources and some use almost none. It's not something that can be directly compared in that way.

                    1 Reply Last reply Reply Quote 2
                    • scottalanmillerS
                      scottalanmiller @Dashrender
                      last edited by

                      @Dashrender said in Announcing the Death of RAID:

                      In other words, I think that @scottalanmiller has been saying that SMBs so rarely tax their systems so much that the performance drain put on the system by software RAID would barely be noticed. So the use of RAID as a hardware offload for RAIN wouldn't make sense ...

                      This is true in the enterprise as well. That software RAID uses so few resources that you essentially never care but almost always benefit from extra speed (even if you don't need it or notice) is universal.

                      1 Reply Last reply Reply Quote 2
                      • scottalanmillerS
                        scottalanmiller @Dashrender
                        last edited by

                        @Dashrender said in Announcing the Death of RAID:

                        @KOOLER said in Announcing the Death of RAID:

                        @Dashrender said in Announcing the Death of RAID:

                        @Net-Runner said in Announcing the Death of RAID:

                        I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).

                        I wonder if this flies in the face of what @scottalanmiller has been saying that hardware RAID isn't needed for performance reasons?

                        There are many ways to skin a cat and there are some things you can't do w/out hardware components: f.e. SAS in HBA mode can't allow write cache enabled on the disks and can't enable aggressive write-back battery-protected cache because... HBA has none 😉 This means either you acknowledge writes in DRAM (synchronized with some other hosts) or you have to use Enterprise-grade SSDs. Guys like VMware and Microsoft who claim they don't rely on hardware and you can throw away RAID cards are... cheating you! Because now you have to swap RAID cards -> Enterprise grade SSDs they can use as a cache. Pay money to save money. Sweet!

                        Wait a second - are you advocating not using enterprise class drives? I'm pretty sure I read somewhere where @scottalanmiller specifically said, if you plan to have any warranty/support you need to have enterprise drives - sure, the vendor has to support the parts that are under warranty, but can skip the ones that aren't - i.e. you purchase a Dell server and install Samsung SSD, you're on your own for the SSDs.

                        No, I'm a huge advocate of consumer drives. Enterprise drives are mostly just a marketing thing. The only thing that makes them special, in the Winchester world anyway, is the change in URE rates and only parity RAID is really affected by those. No mirroring RAID is (at least not in known real world implementations) and no known RAIN is. They both have effectively no URE concerns. So the need or utility of enterprise drives in the post-parity RAID world is very, very low.

                        What you are thinking of is my recommendation for supported drives that are part of the system itself if you are going for a warranty supported system like from Dell or HPE. Bringing your own drives would push you to vendors like SuperMicro where you can mix and match for the best performance, cost and features.

                        DashrenderD 1 Reply Last reply Reply Quote 2
                        • scottalanmillerS
                          scottalanmiller @travisdh1
                          last edited by

                          @travisdh1 said in Announcing the Death of RAID:

                          @scottalanmiller said in Announcing the Death of RAID:

                          @travisdh1 said in Announcing the Death of RAID:

                          @Dashrender said in Announcing the Death of RAID:

                          @KOOLER said in Announcing the Death of RAID:

                          @Dashrender said in Announcing the Death of RAID:

                          @Net-Runner said in Announcing the Death of RAID:

                          I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).

                          I wonder if this flies in the face of what @scottalanmiller has been saying that hardware RAID isn't needed for performance reasons?

                          There are many ways to skin a cat and there are some things you can't do w/out hardware components: f.e. SAS in HBA mode can't allow write cache enabled on the disks and can't enable aggressive write-back battery-protected cache because... HBA has none 😉 This means either you acknowledge writes in DRAM (synchronized with some other hosts) or you have to use Enterprise-grade SSDs. Guys like VMware and Microsoft who claim they don't rely on hardware and you can throw away RAID cards are... cheating you! Because now you have to swap RAID cards -> Enterprise grade SSDs they can use as a cache. Pay money to save money. Sweet!

                          Wait a second - are you advocating not using enterprise class drives? I'm pretty sure I read somewhere where @scottalanmiller specifically said, if you plan to have any warranty/support you need to have enterprise drives - sure, the vendor has to support the parts that are under warranty, but can skip the ones that aren't - i.e. you purchase a Dell server and install Samsung SSD, you're on your own for the SSDs.

                          You're in the same boat weather you use enterprise class drives or not if you're putting non Dell drives in a Dell server.

                          WD Red (not Red Pro) drives are consumer class stuff, but they do RAID10 perfectly fine. Yet I'd never run them in a parity RAID array because of the low read error rate.

                          Red Pro are consumer, too. Only difference is spindle speed.

                          Did they change that again? They had "discontinued" their low end enterprise line (I forget what they were called before) and just rebranded them Red Pro. So for at least a while, the read error rate on the Reds were lower than the Red Pros.

                          Not sure, but the Red Pro are rebranded SE, which were consumer.

                          1 Reply Last reply Reply Quote 1
                          • scottalanmillerS
                            scottalanmiller @Dashrender
                            last edited by

                            @Dashrender said in Announcing the Death of RAID:

                            @scottalanmiller said in Announcing the Death of RAID:

                            @travisdh1 said in Announcing the Death of RAID:

                            @Dashrender said in Announcing the Death of RAID:

                            @KOOLER said in Announcing the Death of RAID:

                            @Dashrender said in Announcing the Death of RAID:

                            @Net-Runner said in Announcing the Death of RAID:

                            I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).

                            I wonder if this flies in the face of what @scottalanmiller has been saying that hardware RAID isn't needed for performance reasons?

                            There are many ways to skin a cat and there are some things you can't do w/out hardware components: f.e. SAS in HBA mode can't allow write cache enabled on the disks and can't enable aggressive write-back battery-protected cache because... HBA has none 😉 This means either you acknowledge writes in DRAM (synchronized with some other hosts) or you have to use Enterprise-grade SSDs. Guys like VMware and Microsoft who claim they don't rely on hardware and you can throw away RAID cards are... cheating you! Because now you have to swap RAID cards -> Enterprise grade SSDs they can use as a cache. Pay money to save money. Sweet!

                            Wait a second - are you advocating not using enterprise class drives? I'm pretty sure I read somewhere where @scottalanmiller specifically said, if you plan to have any warranty/support you need to have enterprise drives - sure, the vendor has to support the parts that are under warranty, but can skip the ones that aren't - i.e. you purchase a Dell server and install Samsung SSD, you're on your own for the SSDs.

                            You're in the same boat weather you use enterprise class drives or not if you're putting non Dell drives in a Dell server.

                            WD Red (not Red Pro) drives are consumer class stuff, but they do RAID10 perfectly fine. Yet I'd never run them in a parity RAID array because of the low read error rate.

                            Red Pro are consumer, too. Only difference is spindle speed.

                            I was thinking the only difference was spindle speed between these - calling it Pro is very misleading.

                            Pro != Enterprise.

                            Pro implies "engineering sitting at his high end workstation." In drive terms, Pro would indicated "prosumer" or "high end end user."

                            1 Reply Last reply Reply Quote 0
                            • scottalanmillerS
                              scottalanmiller @Dashrender
                              last edited by

                              @Dashrender said in Announcing the Death of RAID:

                              Why are these consumer class? What makes the Gold drives Enterprise?

                              All vendors currently use the same guidelines and it is not an industry standard, even if no one talks about it. Consumer is always high URE fail rates and enterprise is always low. Nothing else is a direct factor. Enterprise drives might "tend" to be many things, but URE rates alone are the hard divide between the two.

                              DashrenderD 1 Reply Last reply Reply Quote 0
                              • DashrenderD
                                Dashrender @scottalanmiller
                                last edited by

                                @scottalanmiller said in Announcing the Death of RAID:

                                @Dashrender said in Announcing the Death of RAID:

                                Why are these consumer class? What makes the Gold drives Enterprise?

                                All vendors currently use the same guidelines and it is not an industry standard, even if no one talks about it. Consumer is always high URE fail rates and enterprise is always low. Nothing else is a direct factor. Enterprise drives might "tend" to be many things, but URE rates alone are the hard divide between the two.

                                Where is the cut off? ^15?

                                scottalanmillerS 1 Reply Last reply Reply Quote 1
                                • scottalanmillerS
                                  scottalanmiller @Dashrender
                                  last edited by

                                  @Dashrender said in Announcing the Death of RAID:

                                  @scottalanmiller said in Announcing the Death of RAID:

                                  @Dashrender said in Announcing the Death of RAID:

                                  Why are these consumer class? What makes the Gold drives Enterprise?

                                  All vendors currently use the same guidelines and it is not an industry standard, even if no one talks about it. Consumer is always high URE fail rates and enterprise is always low. Nothing else is a direct factor. Enterprise drives might "tend" to be many things, but URE rates alone are the hard divide between the two.

                                  Where is the cut off? ^15?

                                  ^14 is consumer, ^15 is enterprise. Those are about the only options on the market. There is no ^13 as that would be totally useless and I'm not aware of anyone having made a ^16 yet.

                                  1 Reply Last reply Reply Quote 0
                                  • DashrenderD
                                    Dashrender @scottalanmiller
                                    last edited by

                                    @scottalanmiller said in Announcing the Death of RAID:

                                    What you are thinking of is my recommendation for supported drives that are part of the system itself if you are going for a warranty supported system like from Dell or HPE. Bringing your own drives would push you to vendors like SuperMicro where you can mix and match for the best performance, cost and features.

                                    I want to ask why we can't/shouldn't use consumer class drives in a Dell or HPE server, but I think the answer might be - because if you're paying for that level of support, why are you not going all in?

                                    Is that right?

                                    i.e. if you want to run your own performance/cost factors, you're better off starting with a SuperMicro, is that what you're saying?

                                    Followup question - should SMBs really be looking to more SuperMicros and less Dell/HPE because of said cost savings using their own provided drives? I realize this is a very general question, and of course can't be applied across the entirety of SMB.

                                    scottalanmillerS 1 Reply Last reply Reply Quote 2
                                    • scottalanmillerS
                                      scottalanmiller @Dashrender
                                      last edited by

                                      @Dashrender said in Announcing the Death of RAID:

                                      @scottalanmiller said in Announcing the Death of RAID:

                                      What you are thinking of is my recommendation for supported drives that are part of the system itself if you are going for a warranty supported system like from Dell or HPE. Bringing your own drives would push you to vendors like SuperMicro where you can mix and match for the best performance, cost and features.

                                      I want to ask why we can't/shouldn't use consumer class drives in a Dell or HPE server, but I think the answer might be - because if you're paying for that level of support, why are you not going all in?

                                      Is that right?

                                      i.e. if you want to run your own performance/cost factors, you're better off starting with a SuperMicro, is that what you're saying?

                                      Yes. That's what I mean.

                                      1 Reply Last reply Reply Quote 1
                                      • 1
                                      • 2
                                      • 3
                                      • 3 / 3
                                      • First post
                                        Last post