ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Large or small Raid 5 with SSD

    Scheduled Pinned Locked Moved IT Discussion
    97 Posts 7 Posters 7.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DonahueD
      Donahue @scottalanmiller
      last edited by

      @scottalanmiller said in Large or small Raid 5 with SSD:

      @Donahue said in Large or small Raid 5 with SSD:

      @scottalanmiller said in Large or small Raid 5 with SSD:

      @Donahue said in Large or small Raid 5 with SSD:

      I know you said earlier that with raid 5, you may as well add that 5th drive to the array and make it a raid 6 as opposed to sitting on the shelf.

      Not "might as well", but "had better make sure you do." Difference in risk is astronomic. If you are even thinking hot spare is an option, we've not explain adequately how it works.

      I was thinking cold spare, not hot spare. I don't want the array rebuilding automatically before I have time to make a conscience decision to do it. But the different is similar, I still would have a spare and is not helping the array at all just sitting on the shelf.

      This isn't a good idea. You should have an array stable enough that you want it rebuilt. If you have this fear, you need a safer array.

      Having never personally used a raid 5, all I have to go on is information that is presented online through mediums like ML. Some, perhaps even most, of the information I find is either out of date or pertains to the use of raid 5 with spinners. I know that in the last 4 years I have had two or three spinners fail in raid 10 arrays, and a few single drives fail in desktops, both spinners and SSD's. So in my mind, a drive failure is a reasonable assumption to occur in the next 5 years. But, we have also never had drives with warranties, so that changes the cost equation too.

      I am not sure that my fear is rational, because my understanding of the actual risk is limited.

      scottalanmillerS 1 Reply Last reply Reply Quote 0
      • scottalanmillerS
        scottalanmiller @Donahue
        last edited by

        @Donahue said in Large or small Raid 5 with SSD:

        @scottalanmiller said in Large or small Raid 5 with SSD:

        @Donahue said in Large or small Raid 5 with SSD:

        @scottalanmiller said in Large or small Raid 5 with SSD:

        @Donahue said in Large or small Raid 5 with SSD:

        I know you said earlier that with raid 5, you may as well add that 5th drive to the array and make it a raid 6 as opposed to sitting on the shelf.

        Not "might as well", but "had better make sure you do." Difference in risk is astronomic. If you are even thinking hot spare is an option, we've not explain adequately how it works.

        I was thinking cold spare, not hot spare. I don't want the array rebuilding automatically before I have time to make a conscience decision to do it. But the different is similar, I still would have a spare and is not helping the array at all just sitting on the shelf.

        This isn't a good idea. You should have an array stable enough that you want it rebuilt. If you have this fear, you need a safer array.

        Having never personally used a raid 5, all I have to go on is information that is presented online through mediums like ML. Some, perhaps even most, of the information I find is either out of date or pertains to the use of raid 5 with spinners. I know that in the last 4 years I have had two or three spinners fail in raid 10 arrays, and a few single drives fail in desktops, both spinners and SSD's. So in my mind, a drive failure is a reasonable assumption to occur in the next 5 years. But, we have also never had drives with warranties, so that changes the cost equation too.

        I am not sure that my fear is rational, because my understanding of the actual risk is limited.

        The MORE you fear a drive failure, the MORE you would fear not rebuilding instantly, automatically. Your fear does not match your response.

        1 Reply Last reply Reply Quote 0
        • scottalanmillerS
          scottalanmiller
          last edited by

          That a drive might fail is not in question. In five years, there is a good chance of a drive failing.

          What you need to do is apply that to your thinking and say "If I fear drives failing, what protects me from that?"

          1 Reply Last reply Reply Quote 0
          • DonahueD
            Donahue
            last edited by

            am I wrong to think that the probability of two drives failing is much less than the probability of just one drive failing? And while say a 24-48 hour decision window plus rebuild time is a lot more exposure than an instant rebuild time, it is still quite low?

            scottalanmillerS 2 Replies Last reply Reply Quote 0
            • scottalanmillerS
              scottalanmiller @Donahue
              last edited by

              @Donahue said in Large or small Raid 5 with SSD:

              am I wrong to think that the probability of two drives failing is much less than the probability of just one drive failing?

              You are correct, but no one is disagreeing with that. It's how you are using this info is what is incorrect.

              1 Reply Last reply Reply Quote 0
              • scottalanmillerS
                scottalanmiller @Donahue
                last edited by

                @Donahue said in Large or small Raid 5 with SSD:

                And while say a 24-48 hour decision window plus rebuild time is a lot more exposure than an instant rebuild time, it is still quite low?

                So that the first drive has failed is unrelated. Once we hit this window, it is, say, 48 hours of "decision" and say 8 hours of rebuilding. During which time, there is no protection.

                1. Why would you add 48 hours of exposure with NO RAID at all, for no reason?
                2. There is only one possible outcome of the decision, to replace the drive. There is no condition under which you would not replace the drive, so why introduce a two day risk window without potential benefit?
                DonahueD 1 Reply Last reply Reply Quote 0
                • DonahueD
                  Donahue @scottalanmiller
                  last edited by

                  @scottalanmiller said in Large or small Raid 5 with SSD:

                  @Donahue said in Large or small Raid 5 with SSD:

                  And while say a 24-48 hour decision window plus rebuild time is a lot more exposure than an instant rebuild time, it is still quite low?

                  So that the first drive has failed is unrelated. Once we hit this window, it is, say, 48 hours of "decision" and say 8 hours of rebuilding. During which time, there is no protection.

                  1. Why would you add 48 hours of exposure with NO RAID at all, for no reason?
                  2. There is only one possible outcome of the decision, to replace the drive. There is no condition under which you would not replace the drive, so why introduce a two day risk window without potential benefit?

                  perhaps that comes from what I have read, and perhaps what I have read would have made sense with spinners and initializing the rebuild inducing the second drive failure. Presumably that extra time would be to make sure all my ducks are in a row with fresh backups and such, but perhaps that is where my error is, and I should know my ducks were in a row long before the first failure.

                  scottalanmillerS 1 Reply Last reply Reply Quote 0
                  • scottalanmillerS
                    scottalanmiller @Donahue
                    last edited by

                    @Donahue said in Large or small Raid 5 with SSD:

                    @scottalanmiller said in Large or small Raid 5 with SSD:

                    @Donahue said in Large or small Raid 5 with SSD:

                    And while say a 24-48 hour decision window plus rebuild time is a lot more exposure than an instant rebuild time, it is still quite low?

                    So that the first drive has failed is unrelated. Once we hit this window, it is, say, 48 hours of "decision" and say 8 hours of rebuilding. During which time, there is no protection.

                    1. Why would you add 48 hours of exposure with NO RAID at all, for no reason?
                    2. There is only one possible outcome of the decision, to replace the drive. There is no condition under which you would not replace the drive, so why introduce a two day risk window without potential benefit?

                    perhaps that comes from what I have read, and perhaps what I have read would have made sense with spinners and initializing the rebuild inducing the second drive failure. Presumably that extra time would be to make sure all my ducks are in a row with fresh backups and such, but perhaps that is where my error is, and I should know my ducks were in a row long before the first failure.

                    With spinners, resilvering can take weeks or months of time, rather than hours, and generally has 6TB+ to resilver with high URE rates. SSDs take hours to resilver, with generally under 1TB of capacity, with low URE rates. So the factors of one apply poorly to the other.

                    DashrenderD 1 Reply Last reply Reply Quote 1
                    • DashrenderD
                      Dashrender @scottalanmiller
                      last edited by

                      @scottalanmiller said in Large or small Raid 5 with SSD:

                      @Donahue said in Large or small Raid 5 with SSD:

                      @scottalanmiller said in Large or small Raid 5 with SSD:

                      @Donahue said in Large or small Raid 5 with SSD:

                      And while say a 24-48 hour decision window plus rebuild time is a lot more exposure than an instant rebuild time, it is still quite low?

                      So that the first drive has failed is unrelated. Once we hit this window, it is, say, 48 hours of "decision" and say 8 hours of rebuilding. During which time, there is no protection.

                      1. Why would you add 48 hours of exposure with NO RAID at all, for no reason?
                      2. There is only one possible outcome of the decision, to replace the drive. There is no condition under which you would not replace the drive, so why introduce a two day risk window without potential benefit?

                      perhaps that comes from what I have read, and perhaps what I have read would have made sense with spinners and initializing the rebuild inducing the second drive failure. Presumably that extra time would be to make sure all my ducks are in a row with fresh backups and such, but perhaps that is where my error is, and I should know my ducks were in a row long before the first failure.

                      With spinners, resilvering can take weeks or months of time, rather than hours, and generally has 6TB+ to resilver with high URE rates. SSDs take hours to resilver, with generally under 1TB of capacity, with low URE rates. So the factors of one apply poorly to the other.

                      why only 1 TB of capacity?

                      scottalanmillerS 1 Reply Last reply Reply Quote 0
                      • scottalanmillerS
                        scottalanmiller
                        last edited by

                        With spinners, you take a backup first because your resilver is often expected to fail. Or the risk is super high, at least.

                        The backup might take two hours, while the rebuild might take two weeks.

                        With SSD, the backup might take longer than the rebuild. So the factors of that alone change a lot, too.

                        1 Reply Last reply Reply Quote 1
                        • scottalanmillerS
                          scottalanmiller @Dashrender
                          last edited by

                          @Dashrender said in Large or small Raid 5 with SSD:

                          @scottalanmiller said in Large or small Raid 5 with SSD:

                          @Donahue said in Large or small Raid 5 with SSD:

                          @scottalanmiller said in Large or small Raid 5 with SSD:

                          @Donahue said in Large or small Raid 5 with SSD:

                          And while say a 24-48 hour decision window plus rebuild time is a lot more exposure than an instant rebuild time, it is still quite low?

                          So that the first drive has failed is unrelated. Once we hit this window, it is, say, 48 hours of "decision" and say 8 hours of rebuilding. During which time, there is no protection.

                          1. Why would you add 48 hours of exposure with NO RAID at all, for no reason?
                          2. There is only one possible outcome of the decision, to replace the drive. There is no condition under which you would not replace the drive, so why introduce a two day risk window without potential benefit?

                          perhaps that comes from what I have read, and perhaps what I have read would have made sense with spinners and initializing the rebuild inducing the second drive failure. Presumably that extra time would be to make sure all my ducks are in a row with fresh backups and such, but perhaps that is where my error is, and I should know my ducks were in a row long before the first failure.

                          With spinners, resilvering can take weeks or months of time, rather than hours, and generally has 6TB+ to resilver with high URE rates. SSDs take hours to resilver, with generally under 1TB of capacity, with low URE rates. So the factors of one apply poorly to the other.

                          why only 1 TB of capacity?

                          How big do you expect SSDs to be when you have many in an array realistically?

                          DashrenderD 1 Reply Last reply Reply Quote 0
                          • DashrenderD
                            Dashrender @scottalanmiller
                            last edited by

                            @scottalanmiller said in Large or small Raid 5 with SSD:

                            @Dashrender said in Large or small Raid 5 with SSD:

                            @scottalanmiller said in Large or small Raid 5 with SSD:

                            @Donahue said in Large or small Raid 5 with SSD:

                            @scottalanmiller said in Large or small Raid 5 with SSD:

                            @Donahue said in Large or small Raid 5 with SSD:

                            And while say a 24-48 hour decision window plus rebuild time is a lot more exposure than an instant rebuild time, it is still quite low?

                            So that the first drive has failed is unrelated. Once we hit this window, it is, say, 48 hours of "decision" and say 8 hours of rebuilding. During which time, there is no protection.

                            1. Why would you add 48 hours of exposure with NO RAID at all, for no reason?
                            2. There is only one possible outcome of the decision, to replace the drive. There is no condition under which you would not replace the drive, so why introduce a two day risk window without potential benefit?

                            perhaps that comes from what I have read, and perhaps what I have read would have made sense with spinners and initializing the rebuild inducing the second drive failure. Presumably that extra time would be to make sure all my ducks are in a row with fresh backups and such, but perhaps that is where my error is, and I should know my ducks were in a row long before the first failure.

                            With spinners, resilvering can take weeks or months of time, rather than hours, and generally has 6TB+ to resilver with high URE rates. SSDs take hours to resilver, with generally under 1TB of capacity, with low URE rates. So the factors of one apply poorly to the other.

                            why only 1 TB of capacity?

                            How big do you expect SSDs to be when you have many in an array realistically?

                            so you're talking about the single drive, not the array. Got it.

                            Though when resilvering, you still read the entire array worth.

                            scottalanmillerS 1 Reply Last reply Reply Quote 0
                            • DonahueD
                              Donahue
                              last edited by

                              For the sake of this thread, I am probably going to use 3.84TB SSD's, but the point remains.

                              1 Reply Last reply Reply Quote 1
                              • scottalanmillerS
                                scottalanmiller @Dashrender
                                last edited by

                                @Dashrender said in Large or small Raid 5 with SSD:

                                @scottalanmiller said in Large or small Raid 5 with SSD:

                                @Dashrender said in Large or small Raid 5 with SSD:

                                @scottalanmiller said in Large or small Raid 5 with SSD:

                                @Donahue said in Large or small Raid 5 with SSD:

                                @scottalanmiller said in Large or small Raid 5 with SSD:

                                @Donahue said in Large or small Raid 5 with SSD:

                                And while say a 24-48 hour decision window plus rebuild time is a lot more exposure than an instant rebuild time, it is still quite low?

                                So that the first drive has failed is unrelated. Once we hit this window, it is, say, 48 hours of "decision" and say 8 hours of rebuilding. During which time, there is no protection.

                                1. Why would you add 48 hours of exposure with NO RAID at all, for no reason?
                                2. There is only one possible outcome of the decision, to replace the drive. There is no condition under which you would not replace the drive, so why introduce a two day risk window without potential benefit?

                                perhaps that comes from what I have read, and perhaps what I have read would have made sense with spinners and initializing the rebuild inducing the second drive failure. Presumably that extra time would be to make sure all my ducks are in a row with fresh backups and such, but perhaps that is where my error is, and I should know my ducks were in a row long before the first failure.

                                With spinners, resilvering can take weeks or months of time, rather than hours, and generally has 6TB+ to resilver with high URE rates. SSDs take hours to resilver, with generally under 1TB of capacity, with low URE rates. So the factors of one apply poorly to the other.

                                why only 1 TB of capacity?

                                How big do you expect SSDs to be when you have many in an array realistically?

                                so you're talking about the single drive, not the array. Got it.

                                Though when resilvering, you still read the entire array worth.

                                Correct, the time to resilver is primarily based on the size of the drive being rebuild. That's the bottleneck, the time to write data back to the one drive.

                                So if 4x 10TB drives takes 2 days to replace a drive.
                                8x 5TB drives would take 1 day to replace a drive.

                                It's not exact, but it is really close.

                                DonahueD 1 Reply Last reply Reply Quote 0
                                • DonahueD
                                  Donahue @scottalanmiller
                                  last edited by

                                  @scottalanmiller said in Large or small Raid 5 with SSD:

                                  @Dashrender said in Large or small Raid 5 with SSD:

                                  @scottalanmiller said in Large or small Raid 5 with SSD:

                                  @Dashrender said in Large or small Raid 5 with SSD:

                                  @scottalanmiller said in Large or small Raid 5 with SSD:

                                  @Donahue said in Large or small Raid 5 with SSD:

                                  @scottalanmiller said in Large or small Raid 5 with SSD:

                                  @Donahue said in Large or small Raid 5 with SSD:

                                  And while say a 24-48 hour decision window plus rebuild time is a lot more exposure than an instant rebuild time, it is still quite low?

                                  So that the first drive has failed is unrelated. Once we hit this window, it is, say, 48 hours of "decision" and say 8 hours of rebuilding. During which time, there is no protection.

                                  1. Why would you add 48 hours of exposure with NO RAID at all, for no reason?
                                  2. There is only one possible outcome of the decision, to replace the drive. There is no condition under which you would not replace the drive, so why introduce a two day risk window without potential benefit?

                                  perhaps that comes from what I have read, and perhaps what I have read would have made sense with spinners and initializing the rebuild inducing the second drive failure. Presumably that extra time would be to make sure all my ducks are in a row with fresh backups and such, but perhaps that is where my error is, and I should know my ducks were in a row long before the first failure.

                                  With spinners, resilvering can take weeks or months of time, rather than hours, and generally has 6TB+ to resilver with high URE rates. SSDs take hours to resilver, with generally under 1TB of capacity, with low URE rates. So the factors of one apply poorly to the other.

                                  why only 1 TB of capacity?

                                  How big do you expect SSDs to be when you have many in an array realistically?

                                  so you're talking about the single drive, not the array. Got it.

                                  Though when resilvering, you still read the entire array worth.

                                  Correct, the time to resilver is primarily based on the size of the drive being rebuild. That's the bottleneck, the time to write data back to the one drive.

                                  So if 4x 10TB drives takes 2 days to replace a drive.
                                  8x 5TB drives would take 1 day to replace a drive.

                                  It's not exact, but it is really close.

                                  but with twice the chance of having to rebuild.

                                  scottalanmillerS 1 Reply Last reply Reply Quote 0
                                  • DonahueD
                                    Donahue
                                    last edited by Donahue

                                    TANSTAAFL

                                    1 Reply Last reply Reply Quote 0
                                    • scottalanmillerS
                                      scottalanmiller @Donahue
                                      last edited by

                                      @Donahue said in Large or small Raid 5 with SSD:

                                      @scottalanmiller said in Large or small Raid 5 with SSD:

                                      @Dashrender said in Large or small Raid 5 with SSD:

                                      @scottalanmiller said in Large or small Raid 5 with SSD:

                                      @Dashrender said in Large or small Raid 5 with SSD:

                                      @scottalanmiller said in Large or small Raid 5 with SSD:

                                      @Donahue said in Large or small Raid 5 with SSD:

                                      @scottalanmiller said in Large or small Raid 5 with SSD:

                                      @Donahue said in Large or small Raid 5 with SSD:

                                      And while say a 24-48 hour decision window plus rebuild time is a lot more exposure than an instant rebuild time, it is still quite low?

                                      So that the first drive has failed is unrelated. Once we hit this window, it is, say, 48 hours of "decision" and say 8 hours of rebuilding. During which time, there is no protection.

                                      1. Why would you add 48 hours of exposure with NO RAID at all, for no reason?
                                      2. There is only one possible outcome of the decision, to replace the drive. There is no condition under which you would not replace the drive, so why introduce a two day risk window without potential benefit?

                                      perhaps that comes from what I have read, and perhaps what I have read would have made sense with spinners and initializing the rebuild inducing the second drive failure. Presumably that extra time would be to make sure all my ducks are in a row with fresh backups and such, but perhaps that is where my error is, and I should know my ducks were in a row long before the first failure.

                                      With spinners, resilvering can take weeks or months of time, rather than hours, and generally has 6TB+ to resilver with high URE rates. SSDs take hours to resilver, with generally under 1TB of capacity, with low URE rates. So the factors of one apply poorly to the other.

                                      why only 1 TB of capacity?

                                      How big do you expect SSDs to be when you have many in an array realistically?

                                      so you're talking about the single drive, not the array. Got it.

                                      Though when resilvering, you still read the entire array worth.

                                      Correct, the time to resilver is primarily based on the size of the drive being rebuild. That's the bottleneck, the time to write data back to the one drive.

                                      So if 4x 10TB drives takes 2 days to replace a drive.
                                      8x 5TB drives would take 1 day to replace a drive.

                                      It's not exact, but it is really close.

                                      but with twice the chance of having to rebuild.

                                      Correct, that you need to rebuild happens roughly twice as often.

                                      1 Reply Last reply Reply Quote 0
                                      • DonahueD
                                        Donahue
                                        last edited by

                                        So I got a quote from xbyte, but it includes this. I had been expecting that I would just load the hypervisor on the raid 5 array and that having a seperate R1 array for the OS was an old way of approaching this. Thoughts?

                                        ObsolesceO scottalanmillerS 2 Replies Last reply Reply Quote 2
                                        • ObsolesceO
                                          Obsolesce @Donahue
                                          last edited by

                                          @Donahue said in Large or small Raid 5 with SSD:

                                          So I got a quote from xbyte, but it includes this. I had been expecting that I would just load the hypervisor on the raid 5 array and that having a seperate R1 array for the OS was an old way of approaching this. Thoughts?

                                          How much money does that add?

                                          scottalanmillerS 1 Reply Last reply Reply Quote 1
                                          • scottalanmillerS
                                            scottalanmiller @Donahue
                                            last edited by

                                            @Donahue said in Large or small Raid 5 with SSD:

                                            So I got a quote from xbyte, but it includes this. I had been expecting that I would just load the hypervisor on the raid 5 array and that having a seperate R1 array for the OS was an old way of approaching this. Thoughts?

                                            Cost is the big factor. As a device, I like it.

                                            1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 4
                                            • 5
                                            • 4 / 5
                                            • First post
                                              Last post