Matching Drives for RAID

scottalanmiller

Q: Is it necessary for drives in a single RAID array to be identical?
A: No. But they should be.

RAID arrays do not care what kind of devices are given to them, they simply consume block devices. There is no necessity for drives or devices to all match, but there are important practicalities to it.

*Under the Hood: RAID arrays effectively use all of their devices in lock step. Whether you have two drives or eighty in your array, all of them go and look for one block of data together and they all way for the slowest drive in the array to return its block before continuing on. When all drives are identical, they all read and write at the same time and we basically get full performance from every device.

Mismatched drives have different performance characteristics. Drives that are slower cause the entire array to wait for them, so even one slow drive in an array of scores of drives will cause every drive to lose performance. A fast drive added to an array of slower drives is completely wasted as the extra speed cannot be utilized.

Part of the complexity is that drives are not exactly "faster" or "slower". Different drives, even ones that sound similar, like two different 7,200 RPM SATA drives will have different caches, mechanical designs and different actions will have different speeds. If one drive is fast at one moment and another is fast at another moment, those "fast" moments have to be lost as the array will await whatever drive is slowest at the moment. The more similar that drives are, the better things will work, but identical is what you want whenever possible.

Drives that are matched have lower wear and tear for the same usage. And that means that they have better reliability when set up that way.

When you have no choice, you do what you need to do. But when we have the option, we want identical, matched drives for our RAID arrays to get the best performance and best investment value in our storage systems.

Q: The drives have to be the same capacity?
A: RAID uses the size of the smallest drive across all devices in the array.

This means that if you have five mismatched drives in an array, the capacity of the lowest one is what is used across all of them. So, for example, if you have 1.2TB, 1.5TB, 2TB, 2TB and 3TB drives in an array, your array calculation would be all of 5x 1.2TB only. So in RAID 5 this would give you 4.8TB usable capacity and in RAID 6 3.6TB usable.

Q: What about capacity when replacing a drive and they do not match?
A: A new drive replacing a failed one in an array must be as large or larger in capacity or the drive cannot be used.

Be careful, different drives round up or down or calculate stated sizes differently. If you are working with identical drives, you have nothing to be concerned about. If you use a new drive that is clearly larger than the old drives in the array, you should be fine except for the mismatched performance and reliability notes above. But the big danger is using a different but "same size" drive as a replacement as different drives are reported differently and a "matched" drive that is even one block smaller than the existing ones in the array will cause the drive to be unusable as a replacement.

dafyre

"and they all way for the slowest drive"... they all wait... ?

Dashrender

Should we toss in there

we want identical, matched, different batch produced drives for our RAID ...

scottalanmiller

@dafyre said in Matching Drives for RAID:

"and they all way for the slowest drive"... they all wait... ?

They wait all the way.

scottalanmiller

Added some additional capacity details to this today.

MattSpeller

@scottalanmiller said in Matching Drives for RAID:

*Under the Hood: RAID arrays effectively use all of their devices in lock step. Whether you have two drives or eighty in your array, all of them go and look for one block of data together and they all way for the slowest drive in the array to return its block before continuing on. When all drives are identical, they all read and write at the same time and we basically get full performance from every device.

A minor change to be more accurate

*Under the Hood: RAID arrays effectively use all of their devices in lock step. Whether you have two drives or eighty in your array, all of them go and look for one block of data together and they all wait for the slowest drive in the array to return its block before continuing on. When all drives are identical makes and models, the differences are much smaller between them and we get closer to full performance from every device.