Understanding Hybrid RAIDs

scottalanmiller

You may have heard of vendors, mostly in the SMB NAS and SAN storage space, talking about how you can use any combination of hard drives, even different sizes, and mix them together and their proprietary RAID or RAID-like technology would accommodate them and make use of whatever storage space is available. This is often called Hybrid RAID or a similar marketing term like Drobo's BeyondRAID or ReadyNAS' FlexRAID. This sounds amazing, finally overcoming that pesky need of RAID for all drives to be uniform.

Except, wait, RAID never had that limitation. Let's go back and see what was really going on in the 1990s.

RAID, while the name suggests that it is made of disks, we know from basic storage concepts that nothing in storage actually uses a disk itself but actually uses a drive appearance or interface. This means that anything that looks like a hard drive can be used in RAID which might include things like other RAID devices, SANs, DASs, partitions, volumes and so forth. In fact, many software RAID systems like those used in Windows and Linux not only don't require the consumption of the entire disk, it is not even recommended officially in most cases but rather to partition the disks with a small non-RAID partition at the beginning from which to boot which is replicated in a non-RAID fashion between the drives used in the array. So the expectation is generally that RAID will be on something "less" than a full hard drive.

RAID systems will use whatever is given to them and in a single RAID array the capacity used from each "drive" must be equal to that of the smallest one in the array. So if we have partitions and nine of them are 600GB and one is 590GB, only 590GB from each can be used and a total of 90GB across the array is just "lost." This does not mean that the sizes must be identical, only that the array will use them identically. This is the source of the first misconception about RAID that fuels the marketing of hybrid RAID systems.

Because it is just a drive appearance, RAID systems have no concern for drives to be identical, or even similar, in nature. A single array could be made up of local SATA, local SAS, LUNs from different SANs, and a RAM drive all in a single array. Media does not matter.

Because of this, if we have drives of multiple sizes that we want to consume in a single array, there is a trick that can be done. And this is very important: this "trick" is so standard and well known that entities like CompTIA and Microsoft required knowledge of it (and why not to do it) going back to the mid-1990s. It was a requirement on certification exams and considered so commonly attempted as to warrant regular warning. This is not obscure in any way.

What can be done is to take the largest common denominator partition size for an array and make a RAID array from that set of partitions. Then repeat with the left over space from remaining drives and do so until all possible space has been consumed. Then, to make this appear as a single array, a logical volume manager simply concatenates or spans these RAID arrays together so that it appears as a single array to the end user.

Here is an example:

You have two 10TB drives, three 8TB drives and two 6TB drives. If we were to combine these requiring a minimum of single parity or better in all cases, we would get 6TB x 7 drives of capacity. Under RAID 5, that would be 36TB. That is a lot of wasted capacity.

If we eliminate the original array of 6TB x 7 partitions, we have 2x 4TB and 3x 2TB left over. With this we can partition again and make a second RAID 5 array of 2TB x 5 drives. This gives us another 8TB capacity bringing us up to 44TB usable from our array.

But wait, we are not done. We still have 2TB x 2 remaining. What do we do? A RAID 1 array, of course, which will provide another 2TB of capacity and bring us up to 46TB

These three RAID arrays under the hood are then spanned together so that the system believes it to be a single array of 46TB. A handy trick to provide maximum capacity from disparate drives. And, of course, a ridiculously bad idea.

The resulting array isn't just slow, it is unpredictably slow where certain block reads will get less than 33% the performance of others in way that the system will be unable to determine ahead of time. An operation that is fast one moment might be slow the next even with no conflicts from other processes.

Drive replacements get extremely complex as a single drive failure might impact many arrays that each need to be repaired, but are all trying to repair simultaneously using the same mechanical resources causing heavy contention that is transparent higher up the stack.

Hybrid arrays of this nature make RAID impossible to fully understand or predict for end users and systems. And, of course, asymmetric reads make reliability decrease as well, beyond the complex resilvering problems. This approach has always been available to us, and always warned against. This is all that is happening under the hood with these so called magic "use any sized disk" hybrid RAID marketing gimmicks. Don't confuse what RAID allows you to do, with how it should be used.

scottalanmiller

Of course, in more modern systems, the use of advanced LVMs instead of older partitions makes this a little more flexible so that more control over the process can exist. But all of the core problems still exist.

Some vendors try to market this mechanism as "RAID virtualization", which isn't a completely crazy name due to the layers of abstraction, but it makes it sound valuable when, in reality, it is not. RAID virtualization when used for the purpose of enabling hot or live RAID array growth is generally a good idea. Used as a kludge to enable bad ideas, it remains bad.