How to Market RAID 6 When Customers Need Safety

coliver

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

@scottalanmiller said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

and second - write hole in ZFS?

ZFS uses variable stripe widths to overcome the write hole. Why no one else has implemented this, I am not sure (backward compatibility concerns, perhaps?) It's been a decade since Sun solved the write hole problem but still today, no one has it solved except for the ZFS implementation of parity RAID. Now, most people avoid it by having batteries, flash cache or insane UPS systems, so it does not come up that often. But the risk is real.

But what is a write hole?

It's when two disks, in a RAID6, don't match the other members of the array. RAID1 and RAID5 have this issue as well but with a single drive.

Dashrender

@coliver said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

@scottalanmiller said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

and second - write hole in ZFS?

ZFS uses variable stripe widths to overcome the write hole. Why no one else has implemented this, I am not sure (backward compatibility concerns, perhaps?) It's been a decade since Sun solved the write hole problem but still today, no one has it solved except for the ZFS implementation of parity RAID. Now, most people avoid it by having batteries, flash cache or insane UPS systems, so it does not come up that often. But the risk is real.

But what is a write hole?

It's when two disks, in a RAID6, don't match the other members of the array. RAID1 and RAID5 have this issue as well but with a single drive.

If that happens in RAID 1/10 as well, then how is it solved?

coliver

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

@coliver said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

@scottalanmiller said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

and second - write hole in ZFS?

ZFS uses variable stripe widths to overcome the write hole. Why no one else has implemented this, I am not sure (backward compatibility concerns, perhaps?) It's been a decade since Sun solved the write hole problem but still today, no one has it solved except for the ZFS implementation of parity RAID. Now, most people avoid it by having batteries, flash cache or insane UPS systems, so it does not come up that often. But the risk is real.

But what is a write hole?

It's when two disks, in a RAID6, don't match the other members of the array. RAID1 and RAID5 have this issue as well but with a single drive.

If that happens in RAID 1/10 as well, then how is it solved?

From my understanding it doesn't happen on RAID1 often. Only when there is a drive/array misconfiguration. However it is common on RAID5/6. I'm not sure the exact mechanism but it has something to do with built in drive caching.

scottalanmiller

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

Can you go into more details on the two failures - RAID 10 loosing second drive in same pair vs potential loss of third drive in RAID 6, what ever the comparison is that makes RAID 10 still safer in the two drive loss scenario...

This one gets complex because there are so many factors involved. I'll start with a list:

RAID 10 is more likely, at the same capacity, to experience the first drive failure due to the fact that it has more disks than RAID 6 (except in the four disk scenario, then they are even.) So RAID 10 starts with more "recovery events" than RAID 6 does. Even the pro-RAID 6 people always skip this which is surprising.
Once a single drive has been lost, now we have a degraded array. During this time, there is lost performance but negligible impact to the array in terms of risk. But there is exposure until the failed drive is replaced.
Once a drive is replaced, RAID 10 rapidly mirrors back to that drive and returns the array to healthy. The time frame here is extremely small and the operation is simple. The reliability of this process is so close to 100% that it cannot be measured on any real world system (80,000 array years sampling, zero failure, no way to gain statistical knowledge.) RAID 6, on the other hand, begins a very complex rebuild operation that takes more time. How much more you have to determine, but always longer than RAID 10. In the real world, it is typical for the rebuild to take days or weeks instead of hours. The difference can be staggering. This provides a many times larger window for a second drive to fail. That alone only raises the risk by a few hundred percent in most cases. Many times the risk of near zero is still pretty low. What is significant is that parity RAID arrays have been shown and are well known to induce additional drive failures during the rebuild operation (it is believed because of the increased wear and tear from a long running, highly intensive operation.) So the chances of secondary drive failure skyrocket from "essentially impossible" to "not at all unlikely."
If a second drive fails on RAID 10, there is only impact if the second drive is a member of the same mirrored pair. This takes the already incredibly low chance of secondary drive failure and reduces it dramatically. (Mirrored Pair testing... 160,000 array years, no dual drive failures!!) So, for all intents and purposes, two disk failure on RAID 1 does not exist when there is no external damaging actor and the failed drive is replaced promptly.
If multiple drives fail on RAID 10 that are not shared on the same mirrored pair, each rebuilds concurrently and independently and do not contribute to a general increase in array level risk as the repair window remains tiny, each heals independently and one failing does not trigger another.
If a second drive fails in RAID 6 all of the risks that led to the second drive failure increase again. Now the burden on the remaining disks takes another jump up beyond the original burden of a single disk failure. And the window in which the array is rebuilding increases, dramatically, typically to about double. So the array then has an even longer repair window with an ever increasing chance of yet another disk failing. If any additional disk fails before one of the failed disks has been rebuilt, the array is lost completely. If any additional disk fails after one, but not both, of the failed disks have been rebuilt, the lengthy and risky process of rebuilding begins again. In the real world, on a moderately large array, a triple disk failure where one disk had been repaired before the third failed, we could literally see rebuild times creeping over the three month mark!
The bigger risk than a third drive failing is hitting a URE during the lengthy dual disk failure rebuild. The standard parity RAID implementations will treat this no differently than a failed disk as the stripe is bad and will drop the entire array resulting in total loss. Even low URE enterprise drives become extremely susceptible to this in a large RAID 6 array rebuild process and if we end up in the triple failure mode scenario, the URE risks nearly double again.
The largest risk, and the one that is totally ignored, with RAID 6 is that in most cases performance becomes unacceptably slow or even disconnects entirely during a rebuild operation. There are many factors involved here so we cannot so this across all cases, but very few people measure their environment to see what the impact would be and having a RAID array offline or nearly offline for days, weeks or, in the triple failure example, as much as an entire season likely means that giving up on the array immediately and restoring from backup would have been a few hour outage with minimal data loss rather than a scenario where the system is offline for 90 days and in the 89.9th day hits and URE and all of that restore time is lost.

scottalanmiller

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

@scottalanmiller said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

and second - write hole in ZFS?

ZFS uses variable stripe widths to overcome the write hole. Why no one else has implemented this, I am not sure (backward compatibility concerns, perhaps?) It's been a decade since Sun solved the write hole problem but still today, no one has it solved except for the ZFS implementation of parity RAID. Now, most people avoid it by having batteries, flash cache or insane UPS systems, so it does not come up that often. But the risk is real.

But what is a write hole?

From Sun's 2005 paper addressing it: "RAID-5 (and other data/parity schemes such as RAID-4, RAID-6, even-odd, and Row Diagonal Parity) never quite delivered on the RAID promise -- and can't -- due to a fatal flaw known as the RAID-5 write hole. Whenever you update the data in a RAID stripe you must also update the parity, so that all disks XOR to zero -- it's that equation that allows you to reconstruct data when a disk fails. The problem is that there's no way to update two or more disks atomically, so RAID stripes can become damaged during a crash or power outage."

RAID Z and the Write Hole

scottalanmiller

@coliver said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

@coliver said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

@scottalanmiller said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

and second - write hole in ZFS?

ZFS uses variable stripe widths to overcome the write hole. Why no one else has implemented this, I am not sure (backward compatibility concerns, perhaps?) It's been a decade since Sun solved the write hole problem but still today, no one has it solved except for the ZFS implementation of parity RAID. Now, most people avoid it by having batteries, flash cache or insane UPS systems, so it does not come up that often. But the risk is real.

But what is a write hole?

It's when two disks, in a RAID6, don't match the other members of the array. RAID1 and RAID5 have this issue as well but with a single drive.

If that happens in RAID 1/10 as well, then how is it solved?

From my understanding it doesn't happen on RAID1 often. Only when there is a drive/array misconfiguration. However it is common on RAID5/6. I'm not sure the exact mechanism but it has something to do with built in drive caching.

It's full name is the RAID 5 Write Hole. It does not exist in mirrored RAID, it is a parity RAID only risk.

coliver

@scottalanmiller said in How to Market RAID 6 When Customers Need Safety:

@coliver said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

@coliver said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

@scottalanmiller said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

and second - write hole in ZFS?

ZFS uses variable stripe widths to overcome the write hole. Why no one else has implemented this, I am not sure (backward compatibility concerns, perhaps?) It's been a decade since Sun solved the write hole problem but still today, no one has it solved except for the ZFS implementation of parity RAID. Now, most people avoid it by having batteries, flash cache or insane UPS systems, so it does not come up that often. But the risk is real.

But what is a write hole?

It's when two disks, in a RAID6, don't match the other members of the array. RAID1 and RAID5 have this issue as well but with a single drive.

If that happens in RAID 1/10 as well, then how is it solved?

From my understanding it doesn't happen on RAID1 often. Only when there is a drive/array misconfiguration. However it is common on RAID5/6. I'm not sure the exact mechanism but it has something to do with built in drive caching.

It's full name is the RAID 5 Write Hole. It does not exist in mirrored RAID, it is a parity RAID only risk.

That's good to know. So it has to do with the parity bit in parity RAID devices. I'll have to look at it more.

DustinB3403

So the RAID 5 Write Hole is active on all parity arrays?

Which means any parity array should be avoided at all cost... doesn't it?

scottalanmiller

@coliver said in How to Market RAID 6 When Customers Need Safety:

@scottalanmiller said in How to Market RAID 6 When Customers Need Safety:

@coliver said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

@coliver said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

@scottalanmiller said in How to Market RAID 6 When Customers Need Safety:

@Dashrender said in How to Market RAID 6 When Customers Need Safety:

and second - write hole in ZFS?

ZFS uses variable stripe widths to overcome the write hole. Why no one else has implemented this, I am not sure (backward compatibility concerns, perhaps?) It's been a decade since Sun solved the write hole problem but still today, no one has it solved except for the ZFS implementation of parity RAID. Now, most people avoid it by having batteries, flash cache or insane UPS systems, so it does not come up that often. But the risk is real.

But what is a write hole?

It's when two disks, in a RAID6, don't match the other members of the array. RAID1 and RAID5 have this issue as well but with a single drive.

If that happens in RAID 1/10 as well, then how is it solved?

From my understanding it doesn't happen on RAID1 often. Only when there is a drive/array misconfiguration. However it is common on RAID5/6. I'm not sure the exact mechanism but it has something to do with built in drive caching.

It's full name is the RAID 5 Write Hole. It does not exist in mirrored RAID, it is a parity RAID only risk.

That's good to know. So it has to do with the parity bit in parity RAID devices. I'll have to look at it more.

Yeah, has to do with the way that it writes.

scottalanmiller

@DustinB3403 said in How to Market RAID 6 When Customers Need Safety:

So the RAID 5 Write Hole is active on all parity arrays?

Which means any parity array should be avoided at all cost... doesn't it?

No, because, like losing multiple disks in RAID 10, it's just not a real world risk. I've been involved in an awful lot of array failures over the years and never once was it because of the write hole. Write holes are rare even when the circumstances allow it to happen - and almost no enterprise system does that. Any enterprise class hardware RAID protects against the write hole, that's why we have battery backed cache and nvram caches on them. ZFS protects against this the Solaris, FreeBSD and OpenIndiana worlds.

The risk really only exists with Linux MD RAID, non-ZFS RAID on BSD, Windows Software RAID, FakeRAID controllers and other situations. The big enterprise software RAID vendors have stated that they assume that you will maintain power to your system and then the write hole cannot happen. If you want to use software RAID, and parity and not use ZFS then you need to either accept the write hole risk or you need to ensure continuous power to the box, the same as the battery cache does for a hardware RAID cache.

bbigford

I once asked a vendor who were pitching an appliance that supported RAID0+1 and RAID1+0, "what would you recommend between the two, to a potential customer?" They said it didn't matter as they are both the same thing.

We didn't go with that vendor.

scottalanmiller

@BBigford said in How to Market RAID 6 When Customers Need Safety:

I once asked a vendor who were pitching an appliance that supported RAID0+1 and RAID1+0, "what would you recommend between the two, to a potential customer?" They said it didn't matter as they are both the same thing.

We didn't go with that vendor.

Amazing. Now that's just stupid. Losing a sale over not knowing your own product is ridiculous.

DustinB3403

@BBigford said in How to Market RAID 6 When Customers Need Safety:

I once asked a vendor who were pitching an appliance that supported RAID0+1 and RAID1+0, "what would you recommend between the two, to a potential customer?" They said it didn't matter as they are both the same thing.

We didn't go with that vendor.

RAID10 vs RAID0+1

scottalanmiller

@DustinB3403 said in How to Market RAID 6 When Customers Need Safety:

@BBigford said in How to Market RAID 6 When Customers Need Safety:

I once asked a vendor who were pitching an appliance that supported RAID0+1 and RAID1+0, "what would you recommend between the two, to a potential customer?" They said it didn't matter as they are both the same thing.

We didn't go with that vendor.

RAID10 vs RAID0+1

Or, you know...

http://www.smbitjournal.com/2014/07/comparing-raid-10-and-raid-01/

DustinB3403

@scottalanmiller said in How to Market RAID 6 When Customers Need Safety:

@DustinB3403 said in How to Market RAID 6 When Customers Need Safety:

@BBigford said in How to Market RAID 6 When Customers Need Safety:

I once asked a vendor who were pitching an appliance that supported RAID0+1 and RAID1+0, "what would you recommend between the two, to a potential customer?" They said it didn't matter as they are both the same thing.

We didn't go with that vendor.

RAID10 vs RAID0+1

Or, you know...

http://www.smbitjournal.com/2014/07/comparing-raid-10-and-raid-01/

TL:DR pictures are prettier