ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    What Makes Parity RAID Safe on SSDs

    IT Discussion
    raid storage ssd parity raid raid 5 raid 6
    11
    45
    5.3k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scottalanmillerS
      scottalanmiller @Obsolesce
      last edited by

      @Tim_G said in What Makes Parity RAID Safe on SSDs:

      Everyone says a RAID 5 rebuild of high-terabyte drives will take FOREVER (with HDDs) and the changes of a failure go up. But I'm thinking to myself... wait, who cares about the time... isn't it about the amount of bits moving? Isn't it because of the massive number of bits moving that increases the chance of URE or whatever... not necessarily the time it takes to move them?

      It's both. This is where you can tell that so many Spiceheads are just repeating things without taking time to understand them, even when posting so called answers. I see this almost daily.

      This is people, no matter how many times this is corrected, conflating UREs with disk failures. They are different things, a URE does not imply a failed disk, the disk is fine and still running. Winchester disks are always spinning and will fail even when not being read. So the time to recovery a failed array matters because:

      • Winchester drives die over time, even when not being read
      • HDD array rebuilds can grow from hours to literally months
      • Rebuilding puts extra strain on the array

      So in the Winchester world, time to recovery does matter because the times are long and there are risks from the extended risk period. SSDs don't fail over time when idle, only significantly from write operations and very trivially from read operations, so speed doesn't matter, only the amount and type of access happening during the rebuild and since the rebuilds are fast, that access is fractional compared to Winchester disks.

      So speed matters, but not for the reasons that they keep stating.

      1 Reply Last reply Reply Quote 2
      • stacksofplatesS
        stacksofplates
        last edited by stacksofplates

        This is also why I have a couple of older servers with RAID 5 that I will be decommissioning this year but haven't removed the RAID 5 from. They are 10K 146GB SAS drives, so the chances of hittiting a URE during a rebuild are low enough.

        DashrenderD 1 Reply Last reply Reply Quote 1
        • DashrenderD
          Dashrender @stacksofplates
          last edited by

          @stacksofplates said in What Makes Parity RAID Safe on SSDs:

          This is also why I have a couple of older servers with RAID 5 that I will be decommissioning this year but haven't removed the RAID 5 from. They are 10K 146GB SAS drives, so the chances of hittiting a URE during a rebuild are low enough.

          LOL - I'm in this same boat. I removed an old server from service with 300 GB drives about a month ago. I wasn't worried about it for the same reason - two RAID 5 arrays, each with three drives (yeah don't ask I didn't build it.).

          1 Reply Last reply Reply Quote 2
          • M
            marcinozga @Dashrender
            last edited by

            @Dashrender said in What Makes Parity RAID Safe on SSDs:

            OK I had to do some digging before posting.

            I recall seeing @scottalanmiller post many times that SSDs don't suffer UREs, but perhaps he was meaning that they are so unlikely in current use patterns we can ignore them.

            http://www.theregister.co.uk/2015/05/07/flash_banishes_the_spectre_of_the_unrecoverable_data_error/

            This post talks about this topic exactly. But I'll shorten it for this post.
            https://i.imgur.com/JfTT7E0.png

            You asked - is it all about the bits, yep, it sure is. As you can see in that graphic, even consumer SSDs are 10x less likely to hit a URE than an enterprise HDD. And there are two levels greater than that. You can see where Scott got the 12 TB basically means 100% likeliness that a resilver will fail. You'll notice that a 600 TB array has about a 50% chance of failure, 300 TB has 25% 100 TB is around 8%.

            I think the new question we need to ask ourselves, what level of risk are we willing to accept? Which isn't new at all, it's something we should have been (and continue) to ask ourselves.

            Don't take those numbers for granted, there are plenty of consumer SSD drives with 10^14 URE. So if you happen to put 6x 2TB SSDs in RAID 5, you can still run into it.
            Now another thing is most misunderstand what URE is and assume it's a set number and you always hit URE after reading 12.5 TB. It's not, it's just a probability or running into sector on disk that cannot be read from. Similar to playing Russian Roulette, if you're lucky, you can keep pulling the trigger indefinitely without blowing your brains out, you can have RAID 5 rebuilding PBs of data without ever hitting URE.

            scottalanmillerS 1 Reply Last reply Reply Quote 0
            • scottalanmillerS
              scottalanmiller @marcinozga
              last edited by

              @marcinozga said in What Makes Parity RAID Safe on SSDs:

              Now another thing is most misunderstand what URE is and assume it's a set number and you always hit URE after reading 12.5 TB. It's not, it's just a probability or running into sector on disk that cannot be read from. Similar to playing Russian Roulette, if you're lucky, you can keep pulling the trigger indefinitely without blowing your brains out, you can have RAID 5 rebuilding PBs of data without ever hitting URE.

              That's why it's a percentage risk number. Otherwise an array of 12.5TB would have a 100% failure rate, but that never quite happens. There is never 100% success nor 100% failure no matter how big or small the array. But it gets really high, really quickly.

              dafyreD 1 Reply Last reply Reply Quote 1
              • dafyreD
                dafyre @scottalanmiller
                last edited by dafyre

                @scottalanmiller said in What Makes Parity RAID Safe on SSDs:

                @marcinozga said in What Makes Parity RAID Safe on SSDs:

                Now another thing is most misunderstand what URE is and assume it's a set number and you always hit URE after reading 12.5 TB. It's not, it's just a probability or running into sector on disk that cannot be read from. Similar to playing Russian Roulette, if you're lucky, you can keep pulling the trigger indefinitely without blowing your brains out, you can have RAID 5 rebuilding PBs of data without ever hitting URE.

                That's why it's a percentage risk number. Otherwise an array of 12.5TB would have a 100% failure rate, but that never quite happens. There is never 100% success nor 100% failure no matter how big or small the array. But it gets really high, really quickly.

                I can honestly say in the dozen or so RAID-5 rebuilds that I've seen in the last 6 years, literally half of them were successful, and the other half were not. This was in arrays of 6 x 250GB drives to 8 x 1TB drives (all Winchester / spinning rust).

                M scottalanmillerS 2 Replies Last reply Reply Quote 0
                • M
                  marcinozga @dafyre
                  last edited by

                  @dafyre How many of the failed ones were actually caused by URE? I'd be interested to see the %.

                  dafyreD scottalanmillerS 2 Replies Last reply Reply Quote 0
                  • dafyreD
                    dafyre @marcinozga
                    last edited by

                    @marcinozga said in What Makes Parity RAID Safe on SSDs:

                    @dafyre How many of the failed ones were actually caused by URE? I'd be interested to see the %.

                    That I can't tell you. Drive "help me" lights came on, and we replaced drives.

                    At one point, with one of our LeftHand SANs, I was sent 8 brand new drives, as the techs saw severe errors on all 8 of them, lol. I had to re-install the LeftHand OS from scratch and let that thing sync with our other unit across the way. Only took a week to sync 7TB... The upshot is that nothing went down or slowed to a crawl.

                    1 Reply Last reply Reply Quote 0
                    • DashrenderD
                      Dashrender
                      last edited by

                      If I'm thinking about this correctly, a help me light is a failed/failing drive, not usually a URE. Single UREs happen all the time but don't take the drive down/offline.

                      This kinda tells me that the lights indicate drive failures.

                      scottalanmillerS M 2 Replies Last reply Reply Quote 0
                      • scottalanmillerS
                        scottalanmiller @dafyre
                        last edited by

                        @dafyre said in What Makes Parity RAID Safe on SSDs:

                        @scottalanmiller said in What Makes Parity RAID Safe on SSDs:

                        @marcinozga said in What Makes Parity RAID Safe on SSDs:

                        Now another thing is most misunderstand what URE is and assume it's a set number and you always hit URE after reading 12.5 TB. It's not, it's just a probability or running into sector on disk that cannot be read from. Similar to playing Russian Roulette, if you're lucky, you can keep pulling the trigger indefinitely without blowing your brains out, you can have RAID 5 rebuilding PBs of data without ever hitting URE.

                        That's why it's a percentage risk number. Otherwise an array of 12.5TB would have a 100% failure rate, but that never quite happens. There is never 100% success nor 100% failure no matter how big or small the array. But it gets really high, really quickly.

                        I can honestly say in the dozen or so RAID-5 rebuilds that I've seen in the last 6 years, literally half of them were successful, and the other half were not. This was in arrays of 6 x 250GB drives to 8 x 1TB drives (all Winchester / spinning rust).

                        That's a rough fail rate.

                        dafyreD 1 Reply Last reply Reply Quote 0
                        • DashrenderD
                          Dashrender
                          last edited by

                          I've been lucky I guess - I've never lost an array when resilvering a RAID 5 array.

                          M scottalanmillerS BRRABillB 3 Replies Last reply Reply Quote 0
                          • scottalanmillerS
                            scottalanmiller @marcinozga
                            last edited by

                            @marcinozga said in What Makes Parity RAID Safe on SSDs:

                            @dafyre How many of the failed ones were actually caused by URE? I'd be interested to see the %.

                            Yeah. RAID fails from so many factors. Often no one records what affected what. We were hit with URE losses on an array that lost zero disks! We never used RAID 5 again 🙂

                            1 Reply Last reply Reply Quote 0
                            • M
                              marcinozga @Dashrender
                              last edited by

                              @Dashrender said in What Makes Parity RAID Safe on SSDs:

                              I've been lucky I guess - I've never lost an array when resilvering a RAID 5 array.

                              Ha, I've been lucky to never use RAID 5 🙂

                              DashrenderD 1 Reply Last reply Reply Quote 0
                              • scottalanmillerS
                                scottalanmiller @Dashrender
                                last edited by

                                @Dashrender said in What Makes Parity RAID Safe on SSDs:

                                If I'm thinking about this correctly, a help me light is a failed/failing drive, not usually a URE. Single UREs happen all the time but don't take the drive down/offline.

                                This kinda tells me that the lights indicate drive failures.

                                URE would never cause a drive failed light as the drive is fine. Light indicators are always failed or failing drives. But a single drive loss would not be what killed the machine, it's a drive loss PLUS a URE that would do it.

                                1 Reply Last reply Reply Quote 0
                                • dafyreD
                                  dafyre @scottalanmiller
                                  last edited by

                                  @scottalanmiller said in What Makes Parity RAID Safe on SSDs:

                                  @dafyre said in What Makes Parity RAID Safe on SSDs:

                                  @scottalanmiller said in What Makes Parity RAID Safe on SSDs:

                                  @marcinozga said in What Makes Parity RAID Safe on SSDs:

                                  Now another thing is most misunderstand what URE is and assume it's a set number and you always hit URE after reading 12.5 TB. It's not, it's just a probability or running into sector on disk that cannot be read from. Similar to playing Russian Roulette, if you're lucky, you can keep pulling the trigger indefinitely without blowing your brains out, you can have RAID 5 rebuilding PBs of data without ever hitting URE.

                                  That's why it's a percentage risk number. Otherwise an array of 12.5TB would have a 100% failure rate, but that never quite happens. There is never 100% success nor 100% failure no matter how big or small the array. But it gets really high, really quickly.

                                  I can honestly say in the dozen or so RAID-5 rebuilds that I've seen in the last 6 years, literally half of them were successful, and the other half were not. This was in arrays of 6 x 250GB drives to 8 x 1TB drives (all Winchester / spinning rust).

                                  That's a rough fail rate.

                                  There for a few years, we had a crazy time with stuff crapping out. Even a new drive array went crazy. After the LeftHand, the RAID 5 rebuilds got better, but still lost far more of them than we should have.

                                  1 Reply Last reply Reply Quote 0
                                  • scottalanmillerS
                                    scottalanmiller @Dashrender
                                    last edited by

                                    @Dashrender said in What Makes Parity RAID Safe on SSDs:

                                    I've been lucky I guess - I've never lost an array when resilvering a RAID 5 array.

                                    I'm not actually 100% sure I've ever seen one recover. Seems like I must have, but I can't actually remember it.

                                    1 Reply Last reply Reply Quote 1
                                    • M
                                      marcinozga @Dashrender
                                      last edited by

                                      @Dashrender Some RAID controllers have certain limit on number of URE they encounter before marking drive as failed. I had that happen on some LSI controllers, but the numbers were really high, perhaps in thousands.

                                      1 Reply Last reply Reply Quote 1
                                      • DashrenderD
                                        Dashrender @marcinozga
                                        last edited by

                                        @marcinozga said in What Makes Parity RAID Safe on SSDs:

                                        @Dashrender said in What Makes Parity RAID Safe on SSDs:

                                        I've been lucky I guess - I've never lost an array when resilvering a RAID 5 array.

                                        Ha, I've been lucky to never use RAID 5 🙂

                                        How long have you been in the game?

                                        M 1 Reply Last reply Reply Quote 0
                                        • M
                                          marcinozga @Dashrender
                                          last edited by

                                          @Dashrender said in What Makes Parity RAID Safe on SSDs:

                                          @marcinozga said in What Makes Parity RAID Safe on SSDs:

                                          @Dashrender said in What Makes Parity RAID Safe on SSDs:

                                          I've been lucky I guess - I've never lost an array when resilvering a RAID 5 array.

                                          Ha, I've been lucky to never use RAID 5 🙂

                                          How long have you been in the game?

                                          Long enough. 17 years if my math is correct. I always managed to push RAID 10 over 5.

                                          DashrenderD 1 Reply Last reply Reply Quote 0
                                          • DashrenderD
                                            Dashrender @marcinozga
                                            last edited by

                                            @marcinozga said in What Makes Parity RAID Safe on SSDs:

                                            @Dashrender said in What Makes Parity RAID Safe on SSDs:

                                            @marcinozga said in What Makes Parity RAID Safe on SSDs:

                                            @Dashrender said in What Makes Parity RAID Safe on SSDs:

                                            I've been lucky I guess - I've never lost an array when resilvering a RAID 5 array.

                                            Ha, I've been lucky to never use RAID 5 🙂

                                            How long have you been in the game?

                                            Long enough. 17 years if my math is correct. I always managed to push RAID 10 over 5.

                                            Wow - so back in the late 90's you were able to always justify the price of RAID 10, great. I wasn't - and really it wasn't needed either.

                                            M 1 Reply Last reply Reply Quote 1
                                            • 1
                                            • 2
                                            • 3
                                            • 1 / 3
                                            • First post
                                              Last post