Single SSD PCIe vs HDD RAID Reliability
-
@scottalanmiller said:
@Francesco-Provino said:
I know about the legendary reliability of FusionIO, but… I don't think we really need THAT much reliability, not at this price! Replicating every VM to SAN, I can power on the VMs of a failure node in almost no time directly from the SAN.
There are three ways to handle this replication:
- Full Synchronization replication
- Asynchronous replication
- Backup mechanisms
Of these you have these impacts or tradeoffs:
Full Sync: This is a form of network RAID 1. You will need to wait for the SAN to respond that it has written a copy of the data. While your read performance will be as fast as the Intel PCIe SSD can go, the writes will be as slow as the SAN can do. So while this is safe and allows for storage failover without dataloss or downtime, the impact to writes is enormous.
Async: Data is only crash consistent. You can have "nearly every byte" that you had before but data can and sometimes does corrupt. It cannot be tested as corruption only happens some of the time and typically happens under load. So there is a risk that your SAN would be corrupted and useless in the event of the PCIe SSD failing.
Backup/Restore: Needs quiescence to be safe which inflicts a performance penalty on its own. In the event of a PCIe SSD failure you are doing a DR scenario and facing some dataloss.
So there are options, each with different caveats. It would depend on what needs your business has as to which would make sense for you.
Thanks for the clarification on replication, I really appreciate it.
We will do both async replication from SSD to SAN (direct attach Fibre Channel, already in place, our main storage pool as of today) and backup to a NAS unit (QNAP, big SATA drives). -
@Francesco-Provino said:
I know about It, but thanks to the replication I think we can live with that. We can have few hours of downtime without losing too much money.
How quickly does Intel do replacements? Intel is not an enterprise supplies like HP, Dell or Fujitsu.
-
@scottalanmiller said:
@Francesco-Provino said:
I know about It, but thanks to the replication I think we can live with that. We can have few hours of downtime without losing too much money.
How quickly does Intel do replacements? Intel is not an enterprise supplies like HP, Dell or Fujitsu.
As I said, I can restart the VMs almost immediately on the SAN (they are replicated, so ready to be restarted), or restore them, both from replication pool or from backup, to one of the other two servers local storage.
So, we can wait some days for Intel.
-
@Francesco-Provino said:
We mainly do VDI and database stuff… it's not that we require such great IOPS count, but… what are the alternatives? Buy IBM spindles in 2015, at an higher price of the SSD? Double the price for 1/100 IOPS? Does it really makes sense?
That's what I would call a "leap alternative." The two are not comparable. The Intel board has more IOPS, but does that matter? I feel like that is a red herring here, definitely for VDI. Not that it is bad, just that the fact that it is 100x higher is pointless (and incorrect, by desktop SSD is quite old and only 1/4th speed of these so you should be able to get in the ballpark.)
You are jumping from "third party unsupported SSD" in one case to "primary OEM fully warranted and supported" in the other. Of course one is drastically more cost effective. But all that you are showing is that full enterprise support on hard drives is costly. You are comparing apples to oranges.
If you want to see a reasonable alternative to a third party, unsupported PCIe SSD you would compare against third party, usupported SATA SSD. In which case you would find that you could be doing RAID 10 with hundreds of thousands of IOPS for around $400 or RAID 5 for around $300. Suddenly the cost per IOPS is pretty similar.
-
@Francesco-Provino said:
As I said, I can restart the VMs almost immediately on the SAN (they are replicated, so ready to be restarted), or restore them, both from replication pool or from backup, to one of the other two servers local storage.
What is the manner of replication?
-
@Francesco-Provino said:
We will do both async replication from SSD to SAN (direct attach Fibre Channel, already in place, our main storage pool as of today) and backup to a NAS unit (QNAP, big SATA drives).
So the failover to the SAN is risky in that data could be lost because it is only crash consistent and the filesystem and/or databases might be corrupted when attempting to use it.
What is the time of dataloss if you need to go to the QNAP to do a restore?
-
So the real question is this....
What makes 400K IOPS without RAID worth $600 - $800 when 300K IOPS with RAID is just $300 for this specific use case?
-
And, it should be pointed out, that a $300 RAID 5 array here is likely safer (both in terms of continuous uptime as well as in terms of dataloss) than the PCIe SSD + the SAN replication. If it were me, and I had to choose between the RAID array and the async replication to an external SAN I'd take the SSD RAID 5 array because it is fully consistent, not just crash consistent.
-
@scottalanmiller said:
@Francesco-Provino said:
As I said, I can restart the VMs almost immediately on the SAN (they are replicated, so ready to be restarted), or restore them, both from replication pool or from backup, to one of the other two servers local storage.
What is the manner of replication?
VMware Replication to the SAN, Veeam to the NAS.
-
@scottalanmiller said:
@Francesco-Provino said:
We will do both async replication from SSD to SAN (direct attach Fibre Channel, already in place, our main storage pool as of today) and backup to a NAS unit (QNAP, big SATA drives).
So the failover to the SAN is risky in that data could be lost because it is only crash consistent and the filesystem and/or databases might be corrupted when attempting to use it.
What is the time of dataloss if you need to go to the QNAP to do a restore?
That's always true with async replication. QNAP is in the same building, connected with gigabit network. In my tests, I can retrieve the backup of our biggest VM in almost an hour and a half. Totally ok for us.
-
@scottalanmiller said:
So the real question is this....
What makes 400K IOPS without RAID worth $600 - $800 when 300K IOPS with RAID is just $300 for this specific use case?
@scottalanmiller said:
And, it should be pointed out, that a $300 RAID 5 array here is likely safer (both in terms of continuous uptime as well as in terms of dataloss) than the PCIe SSD + the SAN replication. If it were me, and I had to choose between the RAID array and the async replication to an external SAN I'd take the SSD RAID 5 array because it is fully consistent, not just crash consistent.
Unfortunately, this is not my case: OEM SSD aren't supported with our RAID cards in the servers, and VMware can't do software RAID (apart from, well, sort of, uhm, VSAN).
IBM's SAS SSD are still incredibly expensive. -
If you can compare Samsung drives like this one: http://www.amazon.com/Samsung-2-5-Inch-Internal-MZ-75E500B-AM/dp/B00OBRE5UE/ref=sr_1_1?ie=UTF8&qid=1447070361&sr=8-1&keywords=samsung+ssd+500GB
And the details on the Intel PCIe card: http://www.thessdreview.com/our-reviews/intel-ssd-dc-p3700-nvme-ssd-enthusiasts-report/ (that's the p3700, not the p3500)
It seems like the PCIe card is difficult to choose in this case. You can get more IOPS for less money and more protection from the SATA SSDs still.
-
@Francesco-Provino said:
Unfortunately, this is not my case: OEM SSD aren't supported with our RAID cards in the servers, and VMware can't do software RAID (apart from, well, sort of, uhm, VSAN).
IBM's SAS SSD are still incredibly expensive.Ah... the devil is in the details. You are using VMware and lack enterprise software RAID options so can't do super high performance SSD without having a RAID card to support it. Yet another VMware caveat. They screw you at every turn. So many limitations that you would never guess would be there.
Are you sure that "unsupported" is the case, though? Of course it is not supported by IBM, neither is the Intel PCIe board. So both cases are equally without support. The question is "do they work?"
-
@scottalanmiller said:
@Francesco-Provino said:
Unfortunately, this is not my case: OEM SSD aren't supported with our RAID cards in the servers, and VMware can't do software RAID (apart from, well, sort of, uhm, VSAN).
IBM's SAS SSD are still incredibly expensive.Ah... the devil is in the details. You are using VMware and lack enterprise software RAID options so can't do super high performance SSD without having a RAID card to support it. Yet another VMware caveat. They screw you at every turn. So many limitations that you would never guess would be there.
Are you sure that "unsupported" is the case, though? Of course it is not supported by IBM, neither is the Intel PCIe board. So both cases are equally without support. The question is "do they work?"
This consideration is interesting, but I don't even know if it's possible to put OEM disks in those hot-swap slots…
-
@Francesco-Provino said:
This consideration is interesting, but I don't even know if it's possible to put OEM disks in those hot-swap slots…
Should be, people do it all of the time. It's very standard. There are problems sometimes and the RAID monitoring tools mostly don't work. But blocking non-OEM drives is illegal in many countries (like the US and I presume the EU) so they normally work.
-
Putting a non-OEM disk into a hot swap bay is no different then putting on into a PCIe bay.
-
@scottalanmiller said:
Putting a non-OEM disk into a hot swap bay is no different then putting on into a PCIe bay.
Errr, no, It will lack the caddy, and I don't think they sell it as spare parts.
-
Oh, I see. I don't use IBM servers (especially now that they don't make them anymore, but even before that as even IBM doesn't use their own servers) and forgot that they might be pulling the caddie trick on you. HP does this as well, Dell and SuperMicro do not.
You are right, you might be stuck. In the future, I would use this as a solid reason to avoid both IBM and VMware (IBM is gone now, so it matters little) as both are causing you to:
- Have to spend extra to get less.
- Avoid standard best practices.
- Work around basic system limitations.
- Go to unsupported designs.
I can see why you are interested in the PCIe SSD approach. It isn't because it is cheap or fast or reliable - it is a workaround to the IBM and VMware decisions I think, when we look at it from that perspective, it starts to make a lot more sense. From purely a technology standpoint, I don't think that it makes sense.
-
I think with all of that info that the PCIe SSD approach makes sense. It will be seriously fast and pretty easy to use. And with the sync and backup options you are pretty decently protected. If you can handle the associated downtime to flip over to the SAN while waiting for the SSD to be replaced you will be fine.
-
Assuming the IBM RAID controller will allow the use of non OEM drives, I'd buy a bunch of tiny drives on ebay, rip out the old drive, mount the SSDs and you should be good.
If you have to to no RAID card (@scottalanmiller - wouldn't this mean he'd have to install a SAS/SATA controller? I'm guessing the system doesn't have onboard support) yet another reason to move to to Hyper-V now.