ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    RAID5 SSD Performance Expectations

    Scheduled Pinned Locked Moved IT Discussion
    raidraid 10performancessdssd raid5
    50 Posts 10 Posters 7.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 1
      1337 @1337
      last edited by 1337

      @Pete-S said in RAID5 SSD Performance Expectations:

      @scottalanmiller said in RAID5 SSD Performance Expectations:

      @Pete-S said in RAID5 SSD Performance Expectations:

      Having a drive failure will become such an odd failure like having a raid controller, a motherboard or a CPU fail. You'd just replace it and restore the entire thing from backup.

      I think drives already fail less than RAID controllers. From working in giant environmnts, the thing that fails more than mobos or CPUs is RAM. That's the worst one as it does the most damage and is hard to mitigate.

      The difference though is that mobo, controllers, PSUs, are stateless to the system but drives are stateful. So their failure has a different type of impact, regardless of frequency.

      Well, the stateful-ness of the drives is not something we can count fully on, hence the saying "raid is not backup".

      What I'm proposing is that when it becomes very unlikely that a drive fails we could rethink our strategy and go for single drives instead of raid arrays. In the very unlikely event that a failure did occur, we are restoring from backup, which we are prepared to do anyway.

      With HDDs the failure rate is too high but with enterprise SSDs it's starting to get into the "will not fail" category.

      As an example assume we have 4 servers with a RAID10 array of 4 x 2TB drives each. Annual failure rate of HDDs are a few percent, say 3% for arguments sake. With 16 drives in total, every year there is about 50% chance that a drive will fail. So over the lifespan of the servers it's very likely that we will see one or more drive failures.

      Now assume the same 4 servers with a single enterprise 4TB NVMe drive in each. Annual failure rate is 0.4% (actual number a few years back). With 4 drives in total, every year there is less than 2% chance that any drive will fail. So over the lifespan of the server it's very unlikely that we will ever see a drive failure at all. Sure, if it does happen anyway, we are restoring from backup instead of rebuilding the array.

      1 Reply Last reply Reply Quote -1
      • B
        biggen
        last edited by biggen

        @Pete-S said in RAID5 SSD Performance Expectations:

        @Pete-S said in RAID5 SSD Performance Expectations:

        @scottalanmiller said in RAID5 SSD Performance Expectations:

        @Pete-S said in RAID5 SSD Performance Expectations:

        Having a drive failure will become such an odd failure like having a raid controller, a motherboard or a CPU fail. You'd just replace it and restore the entire thing from backup.

        I think drives already fail less than RAID controllers. From working in giant environmnts, the thing that fails more than mobos or CPUs is RAM. That's the worst one as it does the most damage and is hard to mitigate.

        The difference though is that mobo, controllers, PSUs, are stateless to the system but drives are stateful. So their failure has a different type of impact, regardless of frequency.

        Well, the stateful-ness of the drives is not something we can count fully on, hence the saying "raid is not backup".

        What I'm proposing is that when it becomes very unlikely that a drive fails we could rethink our strategy and go for single drives instead of raid arrays. In the very unlikely event that a failure did occur, we are restoring from backup, which we are prepared to do anyway.

        With HDDs the failure rate is too high but with enterprise SSDs it's starting to get into the "will not fail" category.

        As an example assume we have 4 servers with a RAID10 array of 4 x 2TB drives each. Annual failure rate of HDDs are a few percent, say 3% for arguments sake. With 16 drives in total, every year there is about 50% chance that a drive will fail. So over the lifespan of the servers it's very likely that we will see one or more drive failures.

        Now assume the same 4 servers with a single enterprise 4TB NVMe drive in each. Annual failure rate is 0.4% (actual number a few years back). With 4 drives in total, every year there is less than 2% chance that any drive will fail. So over the lifespan of the server it's very unlikely that we will ever see a drive failure at all. Sure, if it does happen anyway, we are restoring from backup instead of rebuilding the array.

        As long as you can justify the downtime in the event that a single drive failure takes an entire server down (albeit with a low statistical chance).

        If that isn't a concern no use running RAID anyway.

        1 scottalanmillerS 2 Replies Last reply Reply Quote 1
        • 1
          1337 @biggen
          last edited by 1337

          @biggen said in RAID5 SSD Performance Expectations:

          @Pete-S said in RAID5 SSD Performance Expectations:

          @Pete-S said in RAID5 SSD Performance Expectations:

          @scottalanmiller said in RAID5 SSD Performance Expectations:

          @Pete-S said in RAID5 SSD Performance Expectations:

          Having a drive failure will become such an odd failure like having a raid controller, a motherboard or a CPU fail. You'd just replace it and restore the entire thing from backup.

          I think drives already fail less than RAID controllers. From working in giant environmnts, the thing that fails more than mobos or CPUs is RAM. That's the worst one as it does the most damage and is hard to mitigate.

          The difference though is that mobo, controllers, PSUs, are stateless to the system but drives are stateful. So their failure has a different type of impact, regardless of frequency.

          Well, the stateful-ness of the drives is not something we can count fully on, hence the saying "raid is not backup".

          What I'm proposing is that when it becomes very unlikely that a drive fails we could rethink our strategy and go for single drives instead of raid arrays. In the very unlikely event that a failure did occur, we are restoring from backup, which we are prepared to do anyway.

          With HDDs the failure rate is too high but with enterprise SSDs it's starting to get into the "will not fail" category.

          As an example assume we have 4 servers with a RAID10 array of 4 x 2TB drives each. Annual failure rate of HDDs are a few percent, say 3% for arguments sake. With 16 drives in total, every year there is about 50% chance that a drive will fail. So over the lifespan of the servers it's very likely that we will see one or more drive failures.

          Now assume the same 4 servers with a single enterprise 4TB NVMe drive in each. Annual failure rate is 0.4% (actual number a few years back). With 4 drives in total, every year there is less than 2% chance that any drive will fail. So over the lifespan of the server it's very unlikely that we will ever see a drive failure at all. Sure, if it does happen anyway, we are restoring from backup instead of rebuilding the array.

          As long as you can justify the downtime in the event that a single drive failure takes an entire server down (albeit with a low statistical chance).

          If that isn't a concern no use running RAID anyway.

          That makes sense. But regardless of RAID or not, there are always things that can take the entire server down, for instance a motherboard failure. So that is something that is always there.

          I think you can take the probability x downtime to get the average downtime. And that times the cost per hour if you want to put it in $$$.

          So if something is 2% likely to happen and causes 10 hours of downtime, you get 0.2 hours (12 minutes) of downtime on average. If that downtime is going to cost $10K per hour then it's $2K.

          If that downtime is unacceptable you need to have more servers or more reliable servers. 12 minutes of downtime per year is 99.997% availability. 10 hours of downtime per year is 99.8%.

          scottalanmillerS 1 Reply Last reply Reply Quote 1
          • scottalanmillerS
            scottalanmiller @biggen
            last edited by

            @biggen said in RAID5 SSD Performance Expectations:

            As long as you can justify the downtime in the event that a single drive failure takes an entire server down (albeit with a low statistical chance).

            In business it is rare, but possible, that it is the downtime that matters. It's the dataloss. If losing a few hours of data will cripple you to the tune of millions of dollars, for example, then you do things to protect the dataloss "since backup".

            1 Reply Last reply Reply Quote 0
            • scottalanmillerS
              scottalanmiller @1337
              last edited by

              @Pete-S said in RAID5 SSD Performance Expectations:

              That makes sense. But regardless of RAID or not, there are always things that can take the entire server down, for instance a motherboard failure. So that is something that is always there.

              Hence my point about controller rates. In our giant environment on Wall St. RAID controller failures were the top cause of downtime, then RAM, then mobos. PSUs and drives failed more often, but were hot swap and almost never turned into downtime.

              1 Reply Last reply Reply Quote 1
              • zachary715Z
                zachary715
                last edited by

                Quick update, I modified Server 2 with the SSDs RAID cache policy from Write Through to Write Back, and No Read Ahead to Read Ahead. This appears to have made a drastic improvement as 55GB Windows VM live vMotions to Server 2 are now being completed in about 1 1/2 minutes vs 4 minutes previously, and the network monitor is showing performance on par with what I was seeing on Server 3. Now on to getting all 3 servers in direct connect mode for vMotion and backups over 10Gb/s. Thanks.

                ObsolesceO 1 Reply Last reply Reply Quote 1
                • ObsolesceO
                  Obsolesce @zachary715
                  last edited by

                  @zachary715 said in RAID5 SSD Performance Expectations:

                  I modified Server 2 with the SSDs RAID cache policy from Write Through to Write Back, and No Read Ahead to Read Ahead

                  Why was it write-through to begin with? I've only done that in some very niche instances.

                  zachary715Z 1 Reply Last reply Reply Quote 0
                  • zachary715Z
                    zachary715 @Obsolesce
                    last edited by

                    @Obsolesce said in RAID5 SSD Performance Expectations:

                    @zachary715 said in RAID5 SSD Performance Expectations:

                    I modified Server 2 with the SSDs RAID cache policy from Write Through to Write Back, and No Read Ahead to Read Ahead

                    Why was it write-through to begin with? I've only done that in some very niche instances.

                    I've always configured Write Back in the past, but didn't know if using SSDs changed that. Did some reading initially which led me to believe that Write Through was the better choice for performance as well as data loss issues. Maybe should have done a little more research prior to deciding.

                    scottalanmillerS 1 Reply Last reply Reply Quote 0
                    • scottalanmillerS
                      scottalanmiller @zachary715
                      last edited by

                      @zachary715 said in RAID5 SSD Performance Expectations:

                      Did some reading initially which led me to believe that Write Through was the better choice for performance as well as data loss issues.

                      Write Through is, in theory, better for reliability but isn't a real consider in a well maintained controller. But it kills performance by bypassing the cache.

                      zachary715Z DashrenderD 2 Replies Last reply Reply Quote 0
                      • zachary715Z
                        zachary715 @scottalanmiller
                        last edited by

                        @scottalanmiller said in RAID5 SSD Performance Expectations:

                        @zachary715 said in RAID5 SSD Performance Expectations:

                        Did some reading initially which led me to believe that Write Through was the better choice for performance as well as data loss issues.

                        Write Through is, in theory, better for reliability but isn't a real consider in a well maintained controller. But it kills performance by bypassing the cache.

                        Part of the reason I created this thread so that someone might see my current setup and let me know that. I wasn't aware of how much the cache impacted performance for SSD. I know now 😜

                        DashrenderD scottalanmillerS 2 Replies Last reply Reply Quote 0
                        • DashrenderD
                          Dashrender @scottalanmiller
                          last edited by

                          @scottalanmiller said in RAID5 SSD Performance Expectations:

                          @zachary715 said in RAID5 SSD Performance Expectations:

                          Did some reading initially which led me to believe that Write Through was the better choice for performance as well as data loss issues.

                          Write Through is, in theory, better for reliability but isn't a real consider in a well maintained controller. But it kills performance by bypassing the cache.

                          We assume your controller has either non volatile cache or battery backup.

                          zachary715Z 1 Reply Last reply Reply Quote 0
                          • zachary715Z
                            zachary715 @Dashrender
                            last edited by

                            @Dashrender said in RAID5 SSD Performance Expectations:

                            @scottalanmiller said in RAID5 SSD Performance Expectations:

                            @zachary715 said in RAID5 SSD Performance Expectations:

                            Did some reading initially which led me to believe that Write Through was the better choice for performance as well as data loss issues.

                            Write Through is, in theory, better for reliability but isn't a real consider in a well maintained controller. But it kills performance by bypassing the cache.

                            We assume your controller has either non volatile cache or battery backup.

                            PERC H730p Mini has 2GB NV cache.

                            1 Reply Last reply Reply Quote 0
                            • DashrenderD
                              Dashrender @zachary715
                              last edited by

                              @zachary715 said in RAID5 SSD Performance Expectations:

                              @scottalanmiller said in RAID5 SSD Performance Expectations:

                              @zachary715 said in RAID5 SSD Performance Expectations:

                              Did some reading initially which led me to believe that Write Through was the better choice for performance as well as data loss issues.

                              Write Through is, in theory, better for reliability but isn't a real consider in a well maintained controller. But it kills performance by bypassing the cache.

                              Part of the reason I created this thread so that someone might see my current setup and let me know that. I wasn't aware of how much the cache impacted performance for SSD. I know now 😜

                              Not so much that it's affecting SSD - that it's affecting ANY array behind it.

                              Do that to your HDD and see how badly that system performance crashes.

                              1 Reply Last reply Reply Quote 0
                              • scottalanmillerS
                                scottalanmiller @zachary715
                                last edited by

                                @zachary715 said in RAID5 SSD Performance Expectations:

                                @scottalanmiller said in RAID5 SSD Performance Expectations:

                                @zachary715 said in RAID5 SSD Performance Expectations:

                                Did some reading initially which led me to believe that Write Through was the better choice for performance as well as data loss issues.

                                Write Through is, in theory, better for reliability but isn't a real consider in a well maintained controller. But it kills performance by bypassing the cache.

                                Part of the reason I created this thread so that someone might see my current setup and let me know that. I wasn't aware of how much the cache impacted performance for SSD. I know now 😜

                                As to "why", think of it this way.... the best standard SSD is a little over 100K IOPS. The best NVMe is pushing towards a million. Even a little cache is pushing millions. RAM is crazy fast, even compared to NVMe drives.

                                1 Reply Last reply Reply Quote 0
                                • ObsolesceO
                                  Obsolesce
                                  last edited by

                                  This is how drive testing is such a deep topic. You need to try and match the load, and consider all the things. CrystalDisk does not do that.

                                  You can set up some really good tests with iometer. (I think that's waht it's called, i can't remember now it's been a long time and can't look it up atm)

                                  1 Reply Last reply Reply Quote 0
                                  • 1
                                  • 2
                                  • 3
                                  • 3 / 3
                                  • First post
                                    Last post