Erasure Coding



  • Hi all,

    Erasure coding - is it a safe for use in production on an all-flash array? I'm specifically talking about VMware's vSAN here, however the question is fairly broad.

    The alternative is one I'm most familiar and comfortable with; RAID1(/10) mirrored to another node(s) to provide simple/reliable fault tolerance. However, there are plenty of people talking up the new RAID5/RAID6 erasure coding features as it substantially reduces overheads. Apparently there is far less risk of failure due to the much lower URE rates in flash storage.

    I'm curious what you guys think? Is it risk adverse? @scottalanmiller has posted up some passionate threads in the past about why R5/R6 is the devil (which I totally agree with) so where do you stand with erasure coding?

    Thanks



  • I've no experience with Erasure on VMWare vSAN... but I know that it's production worthy and safe with S2D. It gives you the same resiliency but more efficient capacity. I believe it's nothing more than just an algorithm... so I can't see it being any less safe/efficient when used with a different product.

    I do know that all flash = better efficiency.



  • Because it has to calculate, RAID10 is of course better. But if you really need the capacity and it outweighs the pretty slight performance loss, Erasure is great.



  • RAID 5 is a basic form of erasure coding. It's safe to use with all flash.


  • Service Provider

    @alboup said in Erasure Coding:

    Erasure coding - is it a safe for use in production on an all-flash array?

    For all intents and purposes, erasure coding is a throw away term. It's a huge umbrella term for nearly anything, including all traditional parity RAID. It's too general to ever be used in any serious way. Some erasure coding is insanely safe, some is insanely unsafe. it just means way too many different things.


  • Service Provider

    @alboup said in Erasure Coding:

    I'm curious what you guys think? Is it risk adverse? @scottalanmiller has posted up some passionate threads in the past about why R5/R6 is the devil (which I totally agree with) so where do you stand with erasure coding?

    Even R5 (which is EC) is generally safe in all flash. It's not that R5 is so bad, it's that traditional spinning disks had a risk that was so significant that was exposed too easily by R5 so that R5 is useless with modern spinning disks. SSDs have dramatically lower URE risks, so it changes that risk profile dramatically.


  • Service Provider

    @alboup said in Erasure Coding:

    I'm specifically talking about VMware's vSAN here, however the question is fairly broad.

    It's pretty safe to assume that VMware releasing it means it is production ready. And normally EC is safer on SSD than on spinning disks. So all flash is where it is safest.


  • Vendor

    @alboup said in Erasure Coding:

    Hi all,

    Erasure coding - is it a safe for use in production on an all-flash array? I'm specifically talking about VMware's vSAN here, however the question is fairly broad.

    The alternative is one I'm most familiar and comfortable with; RAID1(/10) mirrored to another node(s) to provide simple/reliable fault tolerance. However, there are plenty of people talking up the new RAID5/RAID6 erasure coding features as it substantially reduces overheads. Apparently there is far less risk of failure due to the much lower URE rates in flash storage.

    I'm curious what you guys think? Is it risk adverse? @scottalanmiller has posted up some passionate threads in the past about why R5/R6 is the devil (which I totally agree with) so where do you stand with erasure coding?

    Thanks

    1. VMware VSAN has software RAID5 and RAID6, these are XOR-based software parity RAID, I don't know why VMware decided to call them "erasure coding" (typically it's something like Reed-Solomon codes or whatever). Probably they decided "parity RAID" isn't cool and "erasure coding" is cool.

    https://pubs.vmware.com/vsphere-60/index.jsp?topic=%2Fcom.vmware.vsphere.virtualsan.doc%2FGUID-AD408FA8-5898-4541-9F82-FE72E6CD6227.html

    1. Microsoft erasure coding is indeed one coming from Azure and while it's OK it's a speed freak because it uses GLOBAL parity, means some regions will be updated more frequently compared to other ones. FTL will take care o that but their E/C was never designed to run with flash for sure!

    https://www.usenix.org/conference/atc12/technical-sessions/presentation/huang

    Verdict: you can use whatever you want in production, both solutions have many-many adopters but none of them wasn't;t designed to run on flash (think about Pure engine) just because in such a case erasure coding should be done within FTL (flash translation layer) on so-called OpenSSDs (or their equivalent, whatever Pure is calling them).

    http://openssd-project.org

    Hope this helped 🙂


  • Vendor

    @Tim_G said in Erasure Coding:

    I've no experience with Erasure on VMWare vSAN... but I know that it's production worthy and safe with S2D. It gives you the same resiliency but more efficient capacity. I believe it's nothing more than just an algorithm... so I can't see it being any less safe/efficient when used with a different product.

    I do know that all flash = better efficiency.

    It actually should do much better one. For some reason MSFT decided to cut off own balls and stop with double parity which is one linear parity sum and one global parity, while it was possible to make N => M e/c, same way as Azure and Ceph does.


Log in to reply
 

Looks like your connection to MangoLassi was lost, please wait while we try to reconnect.