Of course, in more modern systems, the use of advanced LVMs instead of older partitions makes this a little more flexible so that more control over the process can exist. But all of the core problems still exist.
Some vendors try to market this mechanism as "RAID virtualization", which isn't a completely crazy name due to the layers of abstraction, but it makes it sound valuable when, in reality, it is not. RAID virtualization when used for the purpose of enabling hot or live RAID array growth is generally a good idea. Used as a kludge to enable bad ideas, it remains bad.
Someone looking at this thread had a hard time figuring out how to open the registry editor after booting to CLI mode from the 2008 R2 media. from the command line, type regedit and hit enter, it will load.
What you are thinking of is my recommendation for supported drives that are part of the system itself if you are going for a warranty supported system like from Dell or HPE. Bringing your own drives would push you to vendors like SuperMicro where you can mix and match for the best performance, cost and features.
I want to ask why we can't/shouldn't use consumer class drives in a Dell or HPE server, but I think the answer might be - because if you're paying for that level of support, why are you not going all in?
Is that right?
i.e. if you want to run your own performance/cost factors, you're better off starting with a SuperMicro, is that what you're saying?
One of the confusing pieces here is that Linux actually does things more clearly but the Windows world is so confusing that if you carry that confusion into the Linux world, it makes things harder. Windows rarely uses or discloses the names of their product components. So Windows Software RAID is used to describe part of the Windows OS. But what if you have software RAID on Windows that is not Windows Software RAID? Windows Admins typically have no good terminology to discuss this, even though it is common. They just.... don't know what's going on and don't document it. But in Linux, we have the terms on hand all of the time (MD, ZFS, whatever.) So the Linux side isn't as bad as it seems, but if you are used to a weird blend of generic names being used as if they are specifics from the Windows world and assume that the Linux world is just as crazy, then it seems crazy.
That list makes hardware RAID sound safer than ZFS, which is probably not quite true. But is the case is that the average implementation of hardware RAID is quite a bit safer than the average implementation of ZFS software RAID. Hardware RAID "handles everything for you" protecting you from most bad decisions. ZFS leaves all the nitty gritty details up to you which makes it super, duper easy to mess something up and leave yourself vulnerable. This is exacerbated by the Cult of ZFS problem and loads of misinformation swirling about its use. So the average person using ZFS is not even remotely prepared for what is needed to use it safely.
Some problems that we see people have when using ZFS without fully understanding storage:
Believing that ZFS doesn't use RAID (this is extremely common.)
Believing that RAIDZ is magic, rather than a brand name, and that normal RAID concerns do not apply. So we often see people implement RAID 5 in reckless, insane situations using "it's RAIDZ" as an excuse as if RAIDZ isn't just RAID 5 - literally just a brand name for RAID 5.
Treating common features common to all RAID systems as "unique" and believing that ZFS has feature after feature of protection that makes the need to protect against storage failure unnecessary.
Not understanding hot swap and blind swap differences and creating systems that they do not know how to address should a drive fail.
Believing that ZFS being magic is not at risk from power loss and failing to protect caches from power issues - something that they are not normally used to dealing with as hardware RAID does this for you.
Not understanding the CPU and memory needs of ZFS, especially with features like dedupe and RAIDZ3.
Ignoring common RAID knowledge and thinking that using ZFS means not using mirroring technologies.
The most common RAIN approach that I see is taking all disks in the pool, noting their nodal presence and using mirroring to distribute the data so that data mirrors never go to the same disk and/or the same node. So a little like a networked RAID 1E but with more flexibility and the option to add nodal separation and performance testing so that data moves to where it is used.
Are you aware of any open source RAIN systems?
Gluster and Swift
I think Ceph and Lustre may be two others.
Lustre is RAIN, but is closed. Gluster was the open replacement for Lustre.
Just a quick search showed that Lustre was GPL 2.0, not sure if that is new or not.
Oh wow, must be new. It was crazy expensive in 2006 when we were really investigating it. That's awesome.
Ah looks like it went open source in 2010.
Oh cool, so I remember things well then. I'm just out of date. Gluster probably forced their hand, why would anyone consider Lustre when it was closed source? The answer was probably... they wouldn't and didn't.
Yep, I'd assume that was the case. Especially when it is a such a specific, and at the time, niche market.
And when Gluster went directly after them, even in name.
Since the Dell fanboi's hijacked this thread (jk ) I'm going to start another thread.
Actually have some good info for those of us non-Dell users.
I use SuperMIcro.
Is there anything about Dell on here? Dell was mentioned, but nothing Dell specific was said. All of the info applies to SuperMicro as well. IPMI is used instead of iDRAC. And the PERC software and the LSI MegaRAID software is the same. Only difference is branding.
So is it done? Does Matt understand and agree to the point that Scott was making?
Yes I believe so.
TL;DR attempt #1 #2 #3 #4 (counting edits)
RAID10 does not need hot spares
If you have spare slots you'd be better served by a larger array with more IOPS
The corner case (the one raised by the op's question?) is would hot spares reduce the risk of array failure. The answer is 100% absolutely yes it will reduce the risk of failure.
The disagreement (I think..?) was if that's necessary. We agreed that it isn't necessary to have any hot spares for RAID10 unless there's mitigating factors (examples: remote COLO with horrific access issues, extremely risk averse use case).
Thanks again to everyone who replied and gave feedback on this. It's great to know that there's a solid community of knowledgeable people who are willing to share their expertise - I really appreciate it!!
Sadly we didn't find your solution. But happily you found it on your own!
Seems like the perfect case to use RAIN, even if it's within a single system enclosure. @StarWind_Software LSFS, I'm looking at you. @KOOLER I am right in thinking this is the sort of thing LSFS could handle, right?
RAIN in a single enclosure rarely does anything that RAID 10 does not. It's effectively all the same at that point (more or less.) If RAID 10 doesn't work, RAIN isn't going to work either (normally.) The issue here is "single enclosure."
Wouldn't a properly configured single RAIN node make it easier to grow when it's time to add more storage?
I've seen this with Exablox and it was a nice feature!
Yes, if you are preparing for scale out. But if you are just doing it within the context of a single node, it doesn't change anything.
I added a header into the main topic list for that. But it is going to be later in the Advanced Topics, section. Oddly, I know of pretty much no standard Linux Administration tomes that cover DRBD. It's so core, very odd that it so often gets missed.
Could it be that most Linux Admin's don't know about it until they go searching for it? ...That's how I found out about it.
You would hope that the people writing the books would know, though!
Of interesting side note, the Linux md RAID system also implements Intel Matrix RAID and DDF (Disk Data Format) software RAID formats commonly used by consumer FakeRAID systems. Because of this, Linux md can sometimes convert FakeRAID into enterprise md RAID if you really know what you are doing
Remember that this is backup. So if the backup system fails you have options like...
Taking a new backup from the live systems.
Offlining the limping array and taking a full backup of it before attempting a restore
Doing a backup/restore rather than an array recovery
All of these things make RAID 6's risks minimal. This isn't the only copy of anything, it's a backup. And it is not subject to availability risks (at least not in the way that live data is) so things that cause availability issues are not significant.
I once asked a vendor who were pitching an appliance that supported RAID0+1 and RAID1+0, "what would you recommend between the two, to a potential customer?" They said it didn't matter as they are both the same thing.
No, it doesn't support this. RAID 1, you are correct, but Parity RAID 5 or 6 it does not. The OS needs to be up and running to be able to manage the parity RAID so you can't use it for the system install, only for extra data volumes.