Announcing the Death of RAID
-
@scottalanmiller nice article
Something to add to the learnings for this year -
@scottalanmiller said in Announcing the Death of RAID:
@coliver said in Announcing the Death of RAID:
Is StarWinds vSAN considered RAIN?
We'd have to dig in under the hood. I think that they are mostly focused on network RAID, just really advanced.
StarWind uses local reconstruction codes (for now - stand-alone software or hardware RAID on every node; can be RAID0, 1, 5, 6 or 10) and inter-node n-way replication between the nodes, can be considered as a network RAID1. There's no network parity RAID like HPE (ex-Left Hand) or Ceph does.
P.S. We're working on our own local reconstruction codes now, so local protection (SimpliVity style) won't be required soon. FYI.
-
...and then you have companies that cluster servers, with each server having RAID configured. Sacrificing some usable storage there.
-
@BBigford said in Announcing the Death of RAID:
...and then you have companies that cluster servers, with each server having RAID configured. Sacrificing some usable storage there.
That's not uncommon and that's kinda of what Kooler is talking about, they use RAID often on individual nodes as a local means of avoiding full rebuilds under most conditions.
-
I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).
-
@Net-Runner said in Announcing the Death of RAID:
I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).
I wonder if this flies in the face of what @scottalanmiller has been saying that hardware RAID isn't needed for performance reasons?
-
@Dashrender said in Announcing the Death of RAID:
@Net-Runner said in Announcing the Death of RAID:
I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).
I wonder if this flies in the face of what @scottalanmiller has been saying that hardware RAID isn't needed for performance reasons?
There are many ways to skin a cat and there are some things you can't do w/out hardware components: f.e. SAS in HBA mode can't allow write cache enabled on the disks and can't enable aggressive write-back battery-protected cache because... HBA has none This means either you acknowledge writes in DRAM (synchronized with some other hosts) or you have to use Enterprise-grade SSDs. Guys like VMware and Microsoft who claim they don't rely on hardware and you can throw away RAID cards are... cheating you! Because now you have to swap RAID cards -> Enterprise grade SSDs they can use as a cache. Pay money to save money. Sweet!
-
In other words, I think that @scottalanmiller has been saying that SMBs so rarely tax their systems so much that the performance drain put on the system by software RAID would barely be noticed. So the use of RAID as a hardware offload for RAIN wouldn't make sense - even moreso it doesn't make sense since RAIN itself is completely dependent upon the system CPU and NIC/network resources, not the RAID controller itself.
As mentioned by @scottalanmiller above,
@scottalanmiller said in Announcing the Death of RAID:
... they use RAID often on individual nodes as a local means of avoiding full rebuilds under most conditions.
This makes sense, but will offer no performance gains on the RAIN side of the house.
Unless I've misunderstood something.
-
@KOOLER said in Announcing the Death of RAID:
@Dashrender said in Announcing the Death of RAID:
@Net-Runner said in Announcing the Death of RAID:
I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).
I wonder if this flies in the face of what @scottalanmiller has been saying that hardware RAID isn't needed for performance reasons?
There are many ways to skin a cat and there are some things you can't do w/out hardware components: f.e. SAS in HBA mode can't allow write cache enabled on the disks and can't enable aggressive write-back battery-protected cache because... HBA has none This means either you acknowledge writes in DRAM (synchronized with some other hosts) or you have to use Enterprise-grade SSDs. Guys like VMware and Microsoft who claim they don't rely on hardware and you can throw away RAID cards are... cheating you! Because now you have to swap RAID cards -> Enterprise grade SSDs they can use as a cache. Pay money to save money. Sweet!
Wait a second - are you advocating not using enterprise class drives? I'm pretty sure I read somewhere where @scottalanmiller specifically said, if you plan to have any warranty/support you need to have enterprise drives - sure, the vendor has to support the parts that are under warranty, but can skip the ones that aren't - i.e. you purchase a Dell server and install Samsung SSD, you're on your own for the SSDs.
-
@Dashrender said in Announcing the Death of RAID:
@KOOLER said in Announcing the Death of RAID:
@Dashrender said in Announcing the Death of RAID:
@Net-Runner said in Announcing the Death of RAID:
I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).
I wonder if this flies in the face of what @scottalanmiller has been saying that hardware RAID isn't needed for performance reasons?
There are many ways to skin a cat and there are some things you can't do w/out hardware components: f.e. SAS in HBA mode can't allow write cache enabled on the disks and can't enable aggressive write-back battery-protected cache because... HBA has none This means either you acknowledge writes in DRAM (synchronized with some other hosts) or you have to use Enterprise-grade SSDs. Guys like VMware and Microsoft who claim they don't rely on hardware and you can throw away RAID cards are... cheating you! Because now you have to swap RAID cards -> Enterprise grade SSDs they can use as a cache. Pay money to save money. Sweet!
Wait a second - are you advocating not using enterprise class drives? I'm pretty sure I read somewhere where @scottalanmiller specifically said, if you plan to have any warranty/support you need to have enterprise drives - sure, the vendor has to support the parts that are under warranty, but can skip the ones that aren't - i.e. you purchase a Dell server and install Samsung SSD, you're on your own for the SSDs.
You're in the same boat weather you use enterprise class drives or not if you're putting non Dell drives in a Dell server.
WD Red (not Red Pro) drives are consumer class stuff, but they do RAID10 perfectly fine. Yet I'd never run them in a parity RAID array because of the low read error rate.
-
@Net-Runner said in Announcing the Death of RAID:
I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).
RAIN often consumes fewer resources not more. But as RAIN is not a single algorithm like RAID levels are, this varies by implementation. And RAIN does not imply software. Simplicity does RAIN with custom hardware for example.
-
@travisdh1 said in Announcing the Death of RAID:
@Dashrender said in Announcing the Death of RAID:
@KOOLER said in Announcing the Death of RAID:
@Dashrender said in Announcing the Death of RAID:
@Net-Runner said in Announcing the Death of RAID:
I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).
I wonder if this flies in the face of what @scottalanmiller has been saying that hardware RAID isn't needed for performance reasons?
There are many ways to skin a cat and there are some things you can't do w/out hardware components: f.e. SAS in HBA mode can't allow write cache enabled on the disks and can't enable aggressive write-back battery-protected cache because... HBA has none This means either you acknowledge writes in DRAM (synchronized with some other hosts) or you have to use Enterprise-grade SSDs. Guys like VMware and Microsoft who claim they don't rely on hardware and you can throw away RAID cards are... cheating you! Because now you have to swap RAID cards -> Enterprise grade SSDs they can use as a cache. Pay money to save money. Sweet!
Wait a second - are you advocating not using enterprise class drives? I'm pretty sure I read somewhere where @scottalanmiller specifically said, if you plan to have any warranty/support you need to have enterprise drives - sure, the vendor has to support the parts that are under warranty, but can skip the ones that aren't - i.e. you purchase a Dell server and install Samsung SSD, you're on your own for the SSDs.
You're in the same boat weather you use enterprise class drives or not if you're putting non Dell drives in a Dell server.
WD Red (not Red Pro) drives are consumer class stuff, but they do RAID10 perfectly fine. Yet I'd never run them in a parity RAID array because of the low read error rate.
Red Pro are consumer, too. Only difference is spindle speed.
-
@scottalanmiller said in Announcing the Death of RAID:
@travisdh1 said in Announcing the Death of RAID:
@Dashrender said in Announcing the Death of RAID:
@KOOLER said in Announcing the Death of RAID:
@Dashrender said in Announcing the Death of RAID:
@Net-Runner said in Announcing the Death of RAID:
I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).
I wonder if this flies in the face of what @scottalanmiller has been saying that hardware RAID isn't needed for performance reasons?
There are many ways to skin a cat and there are some things you can't do w/out hardware components: f.e. SAS in HBA mode can't allow write cache enabled on the disks and can't enable aggressive write-back battery-protected cache because... HBA has none This means either you acknowledge writes in DRAM (synchronized with some other hosts) or you have to use Enterprise-grade SSDs. Guys like VMware and Microsoft who claim they don't rely on hardware and you can throw away RAID cards are... cheating you! Because now you have to swap RAID cards -> Enterprise grade SSDs they can use as a cache. Pay money to save money. Sweet!
Wait a second - are you advocating not using enterprise class drives? I'm pretty sure I read somewhere where @scottalanmiller specifically said, if you plan to have any warranty/support you need to have enterprise drives - sure, the vendor has to support the parts that are under warranty, but can skip the ones that aren't - i.e. you purchase a Dell server and install Samsung SSD, you're on your own for the SSDs.
You're in the same boat weather you use enterprise class drives or not if you're putting non Dell drives in a Dell server.
WD Red (not Red Pro) drives are consumer class stuff, but they do RAID10 perfectly fine. Yet I'd never run them in a parity RAID array because of the low read error rate.
Red Pro are consumer, too. Only difference is spindle speed.
Did they change that again? They had "discontinued" their low end enterprise line (I forget what they were called before) and just rebranded them Red Pro. So for at least a while, the read error rate on the Reds were lower than the Red Pros.
-
@scottalanmiller said in Announcing the Death of RAID:
@travisdh1 said in Announcing the Death of RAID:
@Dashrender said in Announcing the Death of RAID:
@KOOLER said in Announcing the Death of RAID:
@Dashrender said in Announcing the Death of RAID:
@Net-Runner said in Announcing the Death of RAID:
I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).
I wonder if this flies in the face of what @scottalanmiller has been saying that hardware RAID isn't needed for performance reasons?
There are many ways to skin a cat and there are some things you can't do w/out hardware components: f.e. SAS in HBA mode can't allow write cache enabled on the disks and can't enable aggressive write-back battery-protected cache because... HBA has none This means either you acknowledge writes in DRAM (synchronized with some other hosts) or you have to use Enterprise-grade SSDs. Guys like VMware and Microsoft who claim they don't rely on hardware and you can throw away RAID cards are... cheating you! Because now you have to swap RAID cards -> Enterprise grade SSDs they can use as a cache. Pay money to save money. Sweet!
Wait a second - are you advocating not using enterprise class drives? I'm pretty sure I read somewhere where @scottalanmiller specifically said, if you plan to have any warranty/support you need to have enterprise drives - sure, the vendor has to support the parts that are under warranty, but can skip the ones that aren't - i.e. you purchase a Dell server and install Samsung SSD, you're on your own for the SSDs.
You're in the same boat weather you use enterprise class drives or not if you're putting non Dell drives in a Dell server.
WD Red (not Red Pro) drives are consumer class stuff, but they do RAID10 perfectly fine. Yet I'd never run them in a parity RAID array because of the low read error rate.
Red Pro are consumer, too. Only difference is spindle speed.
I was thinking the only difference was spindle speed between these - calling it Pro is very misleading.
Why are these consumer class? What makes the Gold drives Enterprise? -
@Net-Runner said in Announcing the Death of RAID:
I would treat RAID as a kind of hardware offload since RAIN is known to consume more resources and thus resulting in less performance from the storage array. That is probably one of the major reasons why vendors like StarWind keep using hardware RAID. Especially on smaller deployments (storage capacities).
Network RAID on top of local RAID has huge disadvantages, in performance, reliability and overhead (in many cases.) Overhead is totally unique to each implementation, but here is an example of overhead issues...
If you have a three node cluster using local RAID on each node, have 24TB on each node and network RAID to connect them, you have some touch choices for your RAID.
If you use RAID 0 on each node then you need to fully rebuild each 24TB data set, over the network, in total before the node is restored. That could easily bring your cluster down from overhead alone just from losing a single drive and might leave you waiting days or weeks for the cluster to be really usable again, during which time your risk gets super high, any node losing a single disk means the entire node is lost. So the risks get huge. If you do mirroring, the only reasonably choice there, this is RAID 01 and not nearly as safe as RAID 10. So we'd be looking at a system that is insanely risky compared to just a normal, local RAID array.
-
If we took the same example but moved to RAID 5 for the local disks, we are still so risky that we are roughly equal in risk to using the RAID 0 locally. This is RAID 51. This consumers more than 70% of your disks for parity or mirroring while still being too risky to consider. If you moved to expensive enterprise drives, it might approach "safe-ish", but it's pretty crazy risky and then pretty expensive. If you were using 9x 4TB arrays on each node, you would be giving up 19 out of 27 drives in your cluster and getting something that isn't all that fast and is not very safe. The example before was giving up 16 out of 24 drives, a little better, but not much. The loss number is huge, it should provide a lot of protection if you are giving up so much capacity.
-
To make this work at all, RAID 61 is the riskiest level that we can really consider. With RAID 61 we need to give up 22 out of 30 drives and we still will need to consider the unbelievable impact to one of the nodes that might happen if we lose a disk and need to rebuild. We might lose 80-99% of our storage capabilities on that one node while the RAID 6 repairs itself. Losing a node entirely would become unlikely, but even without losing a node our impacts might be really big. But even if we can absorb a lengthy, intensive rebuild, the storage cost is very large. We would likely use hardware RAID at a cost of some $2100 in hardware for that, plus incredibly low utilization possibilities on the disks.
-
To get truly safe and fast, we'd need RAID 101. This method would require 40 out of 48 drives to be lost to mirroring operations. This would protect us from the intensive rebuilds and mean that individual nodes are essentially never lost (at the storage level, anyway) making the system reasonably safe, but the cost just keeps escalating.
-
Now we could, in theory, not use mirroring across nodes but use parity instead. This can reduce the amount of storage that we need to purchase to make the system work, but it comes at a staggering cost to performance and risk. Imagine something like RAID 5 working over a network. A network based node reconstruction could be very, very bad.
-
We also have to remember that in a network RAID model we carry a risk of node failure from something other than storage. Loss of a CPU, motherboard, fans, memory or whatever would cause an entire node to fail. In a local RAID scenario this is not a huge deal since our storage is intact, we simply replace the failed part and the server comes back online. Not so with network RAID. If we have a bed memory stick and a node goes down, when it comes back online, no matter how intact its local RAID is, the array itself has failed and has to be reconstructed as if it were new. The data stored on it is useless. So activities that would not normally affect storage reliability in a single node view of RAID become devastating to network RAID. This is why network RAID rarely is used beyond two nodes and generally only mirroring. Accidentally reboot two nodes in network RAID 5 before all rebuilds are complete and... all is lost.