RAID 5 URE Clarity Question



  • @scottalanmiller
    I'm referring to this article here: http://www.smbitjournal.com/2012/05/when-no-redundancy-is-more-reliable/

    In it is mentioned that a common SATA drive has a URE rate of 10^14, or once every 12TB of read operations.

    The article also says that "a small six terabyte RAID 5 array using 10^14 URE SATA drives, if we were to lose a single drive, we have only a fifty percent chance that the array will recover assuming the drive is replaced immediately."

    What I need clarity on, is if the URE is per drive, then how can a 6TB RAID 5 have a 50% URE rate? If you are using 6x 1TB SATA drives, and a drive dies, a single drive could potentially only have to read 1TB of data for the rebuild. 1TB is not 50% of 12TB, which is what you said a 10^14 URE rate SATA drive comes out to.

    Did I read it wrong or over look something?



  • But the remaining 5 drives aren't brand new. They have presumably seen a similar amount of activity as the failed drive, and therefore are more likely to fail during the rebuild.


  • Service Provider

    URE rate is by drive. URE risk is by the working set size. 10^14 is a frequency, not your total risk.


  • Service Provider

    @tim_g said in RAID 5 URE Clarity Question:

    If you are using 6x 1TB SATA drives, and a drive dies, a single drive could potentially only have to read 1TB of data for the rebuild. 1TB is not 50% of 12TB, which is what you said a 10^14 URE rate SATA drive comes out to.

    You are thinking of RAID 10. RAID 10 only needs to read in the size of a single drive for a recovery. RAID 5 or RAID 6 must read in the entire array to recover any single lost drive.


  • Service Provider

    So example... 10 drives, 1TB each. One drive fails, and is replaced.

    RAID 10: Restore set size is 1TB. URE risk is to a single file, no array level exposure.
    RAID 5: Restore set size is 9TB. URE risk is to the entire array.

    So in the case of RAID 10, there is roughly a 1/9th chance of hitting a URE at all, and when you do the impact is to potentially corrupt a single file (or two, if they share one block of data.)

    In the case of RAID 5, not only is the risk of hitting a URE astronomincally higher (basically a whole order of magnitude higher) but when it happens everything is lost, not just corruption to one file.


  • Service Provider

    @tim_g said in RAID 5 URE Clarity Question:

    1TB is not 50% of 12TB, which is what you said a 10^14 URE rate SATA drive comes out to.

    12TB is how often that type of drive being read will encounter a URE. 6TB is half of 12TB, 6TB is the set size in the example. So it's the easiest case where we easily see 50% chance of hitting a URE during the rebuild. It's the spot on the curve where the math is simpliest.

    This is ignoring URE dispersion, of course, and in reality when you have lost a drive, your URE risk is higher than in a healthy array. So in reality the risk is over 50%, maybe way over. No one knows how much higher it is, only that it is commonly observed as much higher and demonstrable mathematically and logically makes sense.

    That risk, at 50%+ is from URE alone, of course. All the "normal" risks, like secondary drive failure and write hole remain as additional risks on top of the URE.



  • @scottalanmiller said in RAID 5 URE Clarity Question:

    @tim_g said in RAID 5 URE Clarity Question:

    If you are using 6x 1TB SATA drives, and a drive dies, a single drive could potentially only have to read 1TB of data for the rebuild. 1TB is not 50% of 12TB, which is what you said a 10^14 URE rate SATA drive comes out to.

    You are thinking of RAID 10. RAID 10 only needs to read in the size of a single drive for a recovery. RAID 5 or RAID 6 must read in the entire array to recover any single lost drive.

    I still don't understand. The only thing a hard drive can read, is the data on itself. A 1TB hard drive can only hold 1 TB of possible data. If I have 6x 1TB drives in a RAID 5, and a drive dies, how/why would another single drive read more than 1 TB of data?

    When a drive is being rebuilt, the 1TB of data on a working drive is only read once isn't it?... to help rebuild the data that should be on the new drive being rebuilt... Maybe this is where my thinking is wrong.


  • Service Provider

    @tim_g said in RAID 5 URE Clarity Question:

    I still don't understand. The only thing a hard drive can read, is the data on itself. A 1TB hard drive can only hold 1 TB of possible data. If I have 6x 1TB drives in a RAID 5, and a drive dies, how/why would another single drive read more than 1 TB of data?

    Each drive has to read 1TB. There are five drives. That's 5x 1TB. So 5TB of data in that example. URE rate is once ever 12TB of reads. That's nearly 50% of 12TB.


  • Service Provider

    @tim_g said in RAID 5 URE Clarity Question:

    When a drive is being rebuilt, the 1TB of data on a working drive is only read once isn't it?... to help rebuild the data that should be on the new drive being rebuilt... Maybe this is where my thinking is wrong.

    1TB is being written, and then read. But if a URE occurs during the write/read portion, it can be corrected as we have the source data for that moment. So this is not a risk.

    It's the 5TB that have to be read to restore that data that is where the risk is, not in the 1TB of writing.


  • Service Provider

    That's why we refer to the 5TB size, in that example, as the risk domain. The parity overhead is not within the risk domain, nor is the "read back" from immediate write because that is protected by a redundant in-memory block. Only the exposed, remaining 5TB is at risk. But all of the data that you need is in that 5TB.



  • @scottalanmiller said in RAID 5 URE Clarity Question:

    @tim_g said in RAID 5 URE Clarity Question:

    I still don't understand. The only thing a hard drive can read, is the data on itself. A 1TB hard drive can only hold 1 TB of possible data. If I have 6x 1TB drives in a RAID 5, and a drive dies, how/why would another single drive read more than 1 TB of data?

    Each drive has to read 1TB. There are five drives. That's 5x 1TB. So 5TB of data in that example. URE rate is once ever 12TB of reads. That's nearly 50% of 12TB.

    You said the URE of a single drive is 12TB... or did you mean of a 12TB RAID 5 array as a whole, regardless of drive size and quantity?


  • Service Provider

    @tim_g said in RAID 5 URE Clarity Question:

    @scottalanmiller said in RAID 5 URE Clarity Question:

    @tim_g said in RAID 5 URE Clarity Question:

    I still don't understand. The only thing a hard drive can read, is the data on itself. A 1TB hard drive can only hold 1 TB of possible data. If I have 6x 1TB drives in a RAID 5, and a drive dies, how/why would another single drive read more than 1 TB of data?

    Each drive has to read 1TB. There are five drives. That's 5x 1TB. So 5TB of data in that example. URE rate is once ever 12TB of reads. That's nearly 50% of 12TB.

    You said the URE of a single drive is 12TB... or did you mean of a 12TB RAID 5 array as a whole, regardless of drive size and quantity?

    URE rate is 12TB (10^14.) A rate is the same for a drive or an array. Because it is a rate. But the rate is based on the drives that you buy, because that's what sets the failure rate. Obviously it is physical drives that fail, so it is the quality of the drives you select that determine the rate of the UREs. But UREs are at risk on all of the drives, and they all have to be read, so the risk is cumulative.


  • Service Provider

    @tim_g said in RAID 5 URE Clarity Question:

    ...or did you mean of a 12TB RAID 5 array as a whole, regardless of drive size and quantity?

    A URE rate is the same regardless of what you are looking at. So both.

    It's like your shoes fail, on average, every 100 miles that you walk. If you want 100 miles in one mile a day increments, or a 100 miles all at once, doesn't change the rate. On average, at 100 miles, your shoes will fail regardless of how you segment the increments under the hood.



  • Okay, let's use 6x 2TB drives to avoid confusion, and so the RAID 5 array = 12TB to match your math of a 12 TB RAID 5 being near a 100% to experience a URE.

    Let's say drive E needs to be rebuilt.

    You said each drive has a URE of 10^14.

    How much data that matters needs to be read from drive D in order to help rebuild drive E? I would think a maximum of 400GB needs to be read from drive D. The data that was on drive E, is spread throughout the other 5 drives. So there is ~400GB of data on drive D that needs to be read so it can help rebuild the data that was on drive E. And each other drive will do the same thing.

    Being that drive Ds URE is 10^14, which you said comes out to equal about 12TB of reads, I would think that a chance of a URE happening on drive D would be 3%. So isn't drive D only needed for the 400GB it contains of drive E to help rebuild it? That's 3% of the drives URE rate.

    0_1506632376222_Untitled.jpg


  • Service Provider

    @tim_g said in RAID 5 URE Clarity Question:

    How much data that matters needs to be read from drive D in order to help rebuild drive E?

    That you are asking this means you don't understand the issue. URE is a rate of failure. That alone, I think, should explain everything.

    Or to state in another way, 400% of D has to be read to rebuild E.


  • Service Provider

    @tim_g said in RAID 5 URE Clarity Question:

    I would think a maximum of 400GB needs to be read from drive A.

    No, 2TB from EACH drive. Every drive has to be read 100% to recreate the data with all parity is lost. With RAID 6 it is more complex to explain if only one drive has failed, but the 400% number remains the same, it is just split over five drives instead of four. But the URE rate never changes. But only when two drives are lost do you have URE exposure.

    So the only scenario that matters is 400% of one drive for 8TB of risk domain.


  • Service Provider

    @tim_g said in RAID 5 URE Clarity Question:

    So isn't drive D only needed for the 400GB it contains of drive E to help rebuild it?

    No, D doesn't contain ANYTHING of drive E. That's likely the root of confusion. At no point in parity RAID does any drive contain the contents of any other drive. That's mirroring, and mirroring doesn't have this risk at all.



  • I wanted to change drive D to drive A so they don't get confused... but changed it back.... for anyone else wondering.



  • @scottalanmiller said in RAID 5 URE Clarity Question:

    @tim_g said in RAID 5 URE Clarity Question:

    So isn't drive D only needed for the 400GB it contains of drive E to help rebuild it?

    No, D doesn't contain ANYTHING of drive E. That's likely the root of confusion. At no point in parity RAID does any drive contain the contents of any other drive. That's mirroring, and mirroring doesn't have this risk at all.

    That's not how I mean it... it contains 400GB of parity data that is used to help reconstruct the data in drive E, doesn't it?



  • But not matter what, there's a good 400GB of crap on drive D that is needed to help rebuild the data that was on drive E...


  • Service Provider

    @tim_g said in RAID 5 URE Clarity Question:

    @scottalanmiller said in RAID 5 URE Clarity Question:

    @tim_g said in RAID 5 URE Clarity Question:

    So isn't drive D only needed for the 400GB it contains of drive E to help rebuild it?

    No, D doesn't contain ANYTHING of drive E. That's likely the root of confusion. At no point in parity RAID does any drive contain the contents of any other drive. That's mirroring, and mirroring doesn't have this risk at all.

    That's not how I mean it... it contains 400GB of parity data that is used to help reconstruct the data in drive E, doesn't it?

    No, it contains 2TB of parity data, every block of which is necessary for reconstructing the lost drive(s).



  • @scottalanmiller said in RAID 5 URE Clarity Question:

    @tim_g said in RAID 5 URE Clarity Question:

    @scottalanmiller said in RAID 5 URE Clarity Question:

    @tim_g said in RAID 5 URE Clarity Question:

    So isn't drive D only needed for the 400GB it contains of drive E to help rebuild it?

    No, D doesn't contain ANYTHING of drive E. That's likely the root of confusion. At no point in parity RAID does any drive contain the contents of any other drive. That's mirroring, and mirroring doesn't have this risk at all.

    That's not how I mean it... it contains 400GB of parity data that is used to help reconstruct the data in drive E, doesn't it?

    No, it contains 2TB of parity data, every block of which is necessary for reconstructing the lost drive(s).

    Oh I see... I had it wrong the whole time.


  • Service Provider

    @tim_g said in RAID 5 URE Clarity Question:

    But not matter what, there's a good 400GB of crap on drive D that is needed to help rebuild the data that was on drive E...

    No, parity RAID is like a single file, when it corrupts, it is lost. Doesn't matter how many good blocks there are.



  • @scottalanmiller said in RAID 5 URE Clarity Question:

    @tim_g said in RAID 5 URE Clarity Question:

    But not matter what, there's a good 400GB of crap on drive D that is needed to help rebuild the data that was on drive E...

    No, parity RAID is like a single file, when it corrupts, it is lost. Doesn't matter how many good blocks there are.

    So then it means the entire 2TB of EVERY drive needs to be READ to reconstruct the 2TB that was on the bad drive.


  • Service Provider

    @tim_g said in RAID 5 URE Clarity Question:

    @scottalanmiller said in RAID 5 URE Clarity Question:

    @tim_g said in RAID 5 URE Clarity Question:

    @scottalanmiller said in RAID 5 URE Clarity Question:

    @tim_g said in RAID 5 URE Clarity Question:

    So isn't drive D only needed for the 400GB it contains of drive E to help rebuild it?

    No, D doesn't contain ANYTHING of drive E. That's likely the root of confusion. At no point in parity RAID does any drive contain the contents of any other drive. That's mirroring, and mirroring doesn't have this risk at all.

    That's not how I mean it... it contains 400GB of parity data that is used to help reconstruct the data in drive E, doesn't it?

    No, it contains 2TB of parity data, every block of which is necessary for reconstructing the lost drive(s).

    Oh I see... I had it wrong the whole time.

    I figured that out :) So it is 2TB, from every working drive in the array (4), for 8TB total. Which gives us somewhere around a 60% chance of hitting a URE. That's because 12T is an average, not a guarantee. If it was exactly every 12TB, it would be 67% chance of loss.


  • Service Provider

    @tim_g said in RAID 5 URE Clarity Question:

    @scottalanmiller said in RAID 5 URE Clarity Question:

    @tim_g said in RAID 5 URE Clarity Question:

    But not matter what, there's a good 400GB of crap on drive D that is needed to help rebuild the data that was on drive E...

    No, parity RAID is like a single file, when it corrupts, it is lost. Doesn't matter how many good blocks there are.

    So then it means the entire 2TB of EVERY drive needs to be READ to reconstruct the 2TB that was on the bad drive.

    Correct



  • @scottalanmiller said in RAID 5 URE Clarity Question:

    So it is 2TB, from every working drive in the array (4), for 8TB total,

    to avoid confusion, do you mean (5), for 10TB total? Because there's 6 total, one went bad, 5 working ones left?



  • @scottalanmiller said in RAID 5 URE Clarity Question:

    t you buy, because that's what sets the failure rate. Obviously it is physical drives that fail, so it is the quality of the drives you

    you guys are bouncing between RAID 5 and 6 conversations..


  • Service Provider

    @tim_g said in RAID 5 URE Clarity Question:

    @scottalanmiller said in RAID 5 URE Clarity Question:

    So it is 2TB, from every working drive in the array (4),

    to avoid confusion, do you mean (5), for 10TB total? Because there's 6 total, one went bad, 5 working ones left?

    No, because URE risk only matters when two drives are lost in RAID 6. If you had five drives, you have no URE risk.



  • @scottalanmiller said in RAID 5 URE Clarity Question:

    @tim_g said in RAID 5 URE Clarity Question:

    @scottalanmiller said in RAID 5 URE Clarity Question:

    So it is 2TB, from every working drive in the array (4),

    to avoid confusion, do you mean (5), for 10TB total? Because there's 6 total, one went bad, 5 working ones left?

    No, because URE risk only matters when two drives are lost in RAID 6. If you had five drives, you have no URE risk.

    I'm talking about a 6x 2TB drives in a RAID 5. One of those drives goes bad, so you hot-swap it out with a good one and the rebuilding starts. At this point, URE matters because if a 2nd drive dies before the rebuild is complete, game over.

    I'm not asking or saying anything at all about RAID 6.


Log in to reply
 

Looks like your connection to MangoLassi was lost, please wait while we try to reconnect.