Single SSD PCIe vs HDD RAID Reliability



  • Hello, I'm about to move one of the infrastructure I administer from IPOD (one SAN with 12 spindles) to local PCIe SSD replicated into SAN. I just want to hear some opinion about single PCIe SSD reliability VS a spindles array…

    Or, better, should the PCIe card be as reliable as an hardware RAID card? NAND wearing aside, of course.


  • Service Provider

    This is obviously a tough question to answer as there are so many factors involved. First there is a move from a module system (drives and RAID controller separate), then from spinning rust traditional hard drives (aka Winchester drives) to SSD (solid state drives) and then from tradition RAID arrays to whatever is being done under the hood on the PCIe SSD system (generally we assume erasure encoding on multiple devices but attached to a single card.) So there is a lot going on here to consider.


  • Service Provider

    One of the biggest factors here is the specific PCIe SSD card in question. The big reason for moving to the PCIe SSD model is speed, unmitigated, blinding speed by removing the SAS and SATA bottlenecks and by getting all of the components integrated together. By doing this the controller and drives can work together in a very intelligent way providing for better tuning, lower latency and way more bandwidth. It is a great design.

    Top end card makers like FusionIO build cards with extreme degrees of reliability (and cost.) They are extremely reliable and used in very high end, high criticality businesses for the most extreme workloads. So there is little concern about them from a general reliability standpoint.

    One would generally expect a PCIe SSD array to be more reliable than a traditional, and much lower cost, hardware RAID card.

    The big advantage to the traditional RAID card, RAIDed disks and hot swap model is that while the drives wear out presumably more often they are also designed to be effectively disposable and it is trivial to replace them, on the fly, with little impact to the running workloads. Hot swapping drives is a big benefit that should not be overlooked either.

    Generally PCIe SSE is considered when extreme IOPS are needed beyond what hardware RAID or even software RAID plus SATA SSDs can deliver.



  • What percentage of the wall street company's servers used Fusion IO devices?


  • Service Provider

    @Dashrender said:

    What percentage of the wall street company's servers used Fusion IO devices?

    Percentage of companies probably approaches 100%. FusionIO is just where it is at.

    Percentage of servers is probably 1-5%.



  • @scottalanmiller said:

    @Dashrender said:

    What percentage of the wall street company's servers used Fusion IO devices?

    Percentage of servers is probably 1-5%.

    I was asking to show how little of the company's infrastructure is actually using PCIe SSD systems instead of SSD or HDD systems. Basically to show that most of us probably don't need PCIe systems. That's not to say none of us do, just the majority don't.


  • Service Provider

    Definitely only a very tiny percentage of workloads in the SMB need SSD, let alone PCIe class SSD solutions. Although what the enterprise space does might be misleadingly low because they are often using massive high performance SAN systems for storage consolidation and getting a price that justifies not running PCIe SSD in most cases that the SMB cannot get. So it is entirely possible that the SMB might have an easier time justifying a PCIe SSD designed system than a typical enterprise would. They have different scales and needs. SMBs would essentially never use things like Pure SSD systems or 96Gb/s SAN that enterprises have little issue justifying. So the balance of which one might choose the ultra fast PCIe SSD approach might be skewed in that way.



  • When do you start to consider PCIe SSD rather than SSD drives connected to a normal RAID controller?



  • @scottalanmiller Specifically, I'm talking about Intel p3500 or p3600 in any server VS IBM FC SAN DS3500 direct attached (so, DAS in truth) with 12x15k spindles. Or, considering HDD local storage, a m5110 RAID card on each server.



  • @Reid-Cooper As of today, IOPS/€-wise the NVMe PCIe are actually way more cheaper than SAS SSD!

    The price of a set of SAS SSD from IBM (the only ones supported by the RAID controller that I have in my servers!) that match IOPS and capacity of an Intel (or other brands, of course) PCIe SSD is roughly three times…

    We need to move to local storage, and it seems to me that this is the most convenient approach; but anyway, I was trying to fetch some information about reliability…


  • Service Provider

    @Francesco-Provino said:

    @scottalanmiller Specifically, I'm talking about Intel p3500 or p3600 in any server VS IBM FC SAN DS3500 direct attached (so, DAS in truth) with 12x15k spindles. Or, considering HDD local storage, a m5110 RAID card on each server.

    Those are super cheap PCIe boards from a manufacturer with a horrific track record in this space (their SSDs are generally good but their overall mobos and storage systems are some of the worst.) I would be very wary relying on an Intel board for server usage. Intel seems to lack an "enterprise mindset" and sees the storage world as one of desktops and disposable storage.

    These are new boards, only first released this year. As they appear to just be a single SSD strapped to a PCIe board and priced as such - what are your concerns around reliability?


  • Service Provider

    @Francesco-Provino said:

    We need to move to local storage, and it seems to me that this is the most convenient approach; but anyway, I was trying to fetch some information about reliability…

    Traditional enterprise boards like FusionIO have very good reliability track records. Intel is new to the game and has a good reputation in the SSD space and a bad one in "non-drive storage space." Put the two together and this would be a rather unknown scenario with them.


  • Service Provider

    @Francesco-Provino said:

    @Reid-Cooper As of today, IOPS/€-wise the NVMe PCIe are actually way more cheaper than SAS SSD!

    By a combination of removing the SATA bottleneck, but also by skipping the RAID.



  • @scottalanmiller So Intel 5-years warranty has no such value in this case? I'll be happy to replace them every 3-4 years…



  • @scottalanmiller said:

    @Francesco-Provino said:

    @Reid-Cooper As of today, IOPS/€-wise the NVMe PCIe are actually way more cheaper than SAS SSD!

    By a combination of removing the SATA bottleneck, but also by skipping the RAID.

    Exactly, I think this is definitely a win-win approach.



  • @scottalanmiller said:

    @Francesco-Provino said:

    We need to move to local storage, and it seems to me that this is the most convenient approach; but anyway, I was trying to fetch some information about reliability…

    Traditional enterprise boards like FusionIO have very good reliability track records. Intel is new to the game and has a good reputation in the SSD space and a bad one in "non-drive storage space." Put the two together and this would be a rather unknown scenario with them.

    I know about the legendary reliability of FusionIO, but… I don't think we really need THAT much reliability, not at this price! Replicating every VM to SAN, I can power on the VMs of a failure node in almost no time directly from the SAN.


  • Service Provider

    @Francesco-Provino said:

    @scottalanmiller So Intel 5-years warranty has no such value in this case? I'll be happy to replace them every 3-4 years…

    Warranties have little value when you are talking about your data and uptime. A warranty is to guarantee that you have equipment for the duration, not that the things that you store on that equipment continue to exist. If we are talking a desktop on which no critical data is stored and you have a spare desktop to use until Intel replaces the SSD, sure, the warranty has value. If we are talking about a server holding your critical data the warranty presumably has almost no value.

    When the PCIe SSD fails you will need to order the warranty replacement. What is the replacement terms - four hours, six hours, next business day, two weeks? Do you have to return the failed one first and wait for them to test it? Remember this is a complete storage system not just one drive in a RAID array. When HP or Dell do warranty replacement of a drive there is no downtime or dataloss. When Intel does a replacement of these drives, you are without storage for some amount of time and once replaced, the data from the old SSD is gone.


  • Service Provider

    @Francesco-Provino said:

    Exactly, I think this is definitely a win-win approach.

    If the only goal is IOPS. What workload do you have that is that sensitive to IOPS? They exist, especially databases, but what is the place for downtime? Typically I would expect systems using these drives to have either a RAIN storage system so that storage is covered that way or be part of a network replicated system like a Hyper-V fault tolerant cluster with Starwind replicating between the nodes. That way if one node fails you can run from another which has a copy of the data until the first one is repaired.

    In a stand alone node I would only use these if data is highly static or does not need to generally be backed up. Those are rarely the case in systems that need extreme IOPS.


  • Service Provider

    @Francesco-Provino said:

    I know about the legendary reliability of FusionIO, but… I don't think we really need THAT much reliability, not at this price! Replicating every VM to SAN, I can power on the VMs of a failure node in almost no time directly from the SAN.

    There are three ways to handle this replication:

    • Full Synchronization replication
    • Asynchronous replication
    • Backup mechanisms

    Of these you have these impacts or tradeoffs:

    Full Sync: This is a form of network RAID 1. You will need to wait for the SAN to respond that it has written a copy of the data. While your read performance will be as fast as the Intel PCIe SSD can go, the writes will be as slow as the SAN can do. So while this is safe and allows for storage failover without dataloss or downtime, the impact to writes is enormous.

    Async: Data is only crash consistent. You can have "nearly every byte" that you had before but data can and sometimes does corrupt. It cannot be tested as corruption only happens some of the time and typically happens under load. So there is a risk that your SAN would be corrupted and useless in the event of the PCIe SSD failing.

    Backup/Restore: Needs quiescence to be safe which inflicts a performance penalty on its own. In the event of a PCIe SSD failure you are doing a DR scenario and facing some dataloss.

    So there are options, each with different caveats. It would depend on what needs your business has as to which would make sense for you.



  • @scottalanmiller said:

    @Francesco-Provino said:

    @scottalanmiller So Intel 5-years warranty has no such value in this case? I'll be happy to replace them every 3-4 years…

    Warranties have little value when you are talking about your data and uptime. A warranty is to guarantee that you have equipment for the duration, not that the things that you store on that equipment continue to exist. If we are talking a desktop on which no critical data is stored and you have a spare desktop to use until Intel replaces the SSD, sure, the warranty has value. If we are talking about a server holding your critical data the warranty presumably has almost no value.

    When the PCIe SSD fails you will need to order the warranty replacement. What is the replacement terms - four hours, six hours, next business day, two weeks? Do you have to return the failed one first and wait for them to test it? Remember this is a complete storage system not just one drive in a RAID array. When HP or Dell do warranty replacement of a drive there is no downtime or dataloss. When Intel does a replacement of these drives, you are without storage for some amount of time and once replaced, the data from the old SSD is gone.

    I know about It, but thanks to the replication I think we can live with that. We can have few hours of downtime without losing too much money.
    We mainly do VDI and database stuff… it's not that we require such great IOPS count, but… what are the alternatives? Buy IBM spindles in 2015, at an higher price of the SSD? Double the price for 1/100 IOPS? Does it really makes sense?



  • @scottalanmiller

    @scottalanmiller said:

    @Francesco-Provino said:

    I know about the legendary reliability of FusionIO, but… I don't think we really need THAT much reliability, not at this price! Replicating every VM to SAN, I can power on the VMs of a failure node in almost no time directly from the SAN.

    There are three ways to handle this replication:

    • Full Synchronization replication
    • Asynchronous replication
    • Backup mechanisms

    Of these you have these impacts or tradeoffs:

    Full Sync: This is a form of network RAID 1. You will need to wait for the SAN to respond that it has written a copy of the data. While your read performance will be as fast as the Intel PCIe SSD can go, the writes will be as slow as the SAN can do. So while this is safe and allows for storage failover without dataloss or downtime, the impact to writes is enormous.

    Async: Data is only crash consistent. You can have "nearly every byte" that you had before but data can and sometimes does corrupt. It cannot be tested as corruption only happens some of the time and typically happens under load. So there is a risk that your SAN would be corrupted and useless in the event of the PCIe SSD failing.

    Backup/Restore: Needs quiescence to be safe which inflicts a performance penalty on its own. In the event of a PCIe SSD failure you are doing a DR scenario and facing some dataloss.

    So there are options, each with different caveats. It would depend on what needs your business has as to which would make sense for you.

    Thanks for the clarification on replication, I really appreciate it.
    We will do both async replication from SSD to SAN (direct attach Fibre Channel, already in place, our main storage pool as of today) and backup to a NAS unit (QNAP, big SATA drives).


  • Service Provider

    @Francesco-Provino said:

    I know about It, but thanks to the replication I think we can live with that. We can have few hours of downtime without losing too much money.

    How quickly does Intel do replacements? Intel is not an enterprise supplies like HP, Dell or Fujitsu.



  • @scottalanmiller said:

    @Francesco-Provino said:

    I know about It, but thanks to the replication I think we can live with that. We can have few hours of downtime without losing too much money.

    How quickly does Intel do replacements? Intel is not an enterprise supplies like HP, Dell or Fujitsu.

    As I said, I can restart the VMs almost immediately on the SAN (they are replicated, so ready to be restarted), or restore them, both from replication pool or from backup, to one of the other two servers local storage.

    So, we can wait some days for Intel.


  • Service Provider

    @Francesco-Provino said:

    We mainly do VDI and database stuff… it's not that we require such great IOPS count, but… what are the alternatives? Buy IBM spindles in 2015, at an higher price of the SSD? Double the price for 1/100 IOPS? Does it really makes sense?

    That's what I would call a "leap alternative." The two are not comparable. The Intel board has more IOPS, but does that matter? I feel like that is a red herring here, definitely for VDI. Not that it is bad, just that the fact that it is 100x higher is pointless (and incorrect, by desktop SSD is quite old and only 1/4th speed of these so you should be able to get in the ballpark.)

    You are jumping from "third party unsupported SSD" in one case to "primary OEM fully warranted and supported" in the other. Of course one is drastically more cost effective. But all that you are showing is that full enterprise support on hard drives is costly. You are comparing apples to oranges.

    If you want to see a reasonable alternative to a third party, unsupported PCIe SSD you would compare against third party, usupported SATA SSD. In which case you would find that you could be doing RAID 10 with hundreds of thousands of IOPS for around $400 or RAID 5 for around $300. Suddenly the cost per IOPS is pretty similar.


  • Service Provider

    @Francesco-Provino said:

    As I said, I can restart the VMs almost immediately on the SAN (they are replicated, so ready to be restarted), or restore them, both from replication pool or from backup, to one of the other two servers local storage.

    What is the manner of replication?


  • Service Provider

    @Francesco-Provino said:

    We will do both async replication from SSD to SAN (direct attach Fibre Channel, already in place, our main storage pool as of today) and backup to a NAS unit (QNAP, big SATA drives).

    So the failover to the SAN is risky in that data could be lost because it is only crash consistent and the filesystem and/or databases might be corrupted when attempting to use it.

    What is the time of dataloss if you need to go to the QNAP to do a restore?


  • Service Provider

    So the real question is this....

    What makes 400K IOPS without RAID worth $600 - $800 when 300K IOPS with RAID is just $300 for this specific use case?


  • Service Provider

    And, it should be pointed out, that a $300 RAID 5 array here is likely safer (both in terms of continuous uptime as well as in terms of dataloss) than the PCIe SSD + the SAN replication. If it were me, and I had to choose between the RAID array and the async replication to an external SAN I'd take the SSD RAID 5 array because it is fully consistent, not just crash consistent.



  • @scottalanmiller said:

    @Francesco-Provino said:

    As I said, I can restart the VMs almost immediately on the SAN (they are replicated, so ready to be restarted), or restore them, both from replication pool or from backup, to one of the other two servers local storage.

    What is the manner of replication?

    VMware Replication to the SAN, Veeam to the NAS.



  • @scottalanmiller said:

    @Francesco-Provino said:

    We will do both async replication from SSD to SAN (direct attach Fibre Channel, already in place, our main storage pool as of today) and backup to a NAS unit (QNAP, big SATA drives).

    So the failover to the SAN is risky in that data could be lost because it is only crash consistent and the filesystem and/or databases might be corrupted when attempting to use it.

    What is the time of dataloss if you need to go to the QNAP to do a restore?

    That's always true with async replication. QNAP is in the same building, connected with gigabit network. In my tests, I can retrieve the backup of our biggest VM in almost an hour and a half. Totally ok for us.



Looks like your connection to MangoLassi was lost, please wait while we try to reconnect.