Backup strategy for customer data?



  • We have VM hosts and VMs for test and development in our colo datacenter.
    Eventually we will run real production and have customer data on our servers.

    We need to figure out how we should do backups and how much storage / hardware we are going to need.
    But to do that we first have to figure out what the customers likely would expect from us when it comes to backup.

    Most of the customers data will be held in databases but some data will be files. Some of the data is generated automatically and some is manually entered into web applications.

    If you were an enterprise customer or large SMB, what would you expect if you paid for a service (SaaS) that is hosted somewhere else?



  • @Pete-S said in Backup strategy for customer data?:

    If you were an enterprise customer or large SMB, what would you expect if you paid for a service (SaaS) that is hosted somewhere else?

    Good timing since there was that PerCSoft SaaS compromise this week where the backup mechanism is compromised.

    At a minimum, I'd expect that backups are air gapped and no infection of the SaaS platform or customer system could flow back and also compromise the backups that have already been taken.



  • Backups are always tough to discuss because they tend to be so dramatically dependent on the data in question and how customers are able to protect their own data. In some cases, you'd pretty much naturally expect an ability to replay any transaction for years, in others you'd only expect to be able to recover if the system failed and nearly everything in between.

    And then you have to consider archival data.



  • @scottalanmiller said in Backup strategy for customer data?:

    Backups are always tough to discuss because they tend to be so dramatically dependent on the data in question and how customers are able to protect their own data. In some cases, you'd pretty much naturally expect an ability to replay any transaction for years, in others you'd only expect to be able to recover if the system failed and nearly everything in between.

    And then you have to consider archival data.

    I was thinking that we also need to provide a way for the customers to export and backup their own data as well. I would expect them to want that.



  • @Pete-S said in Backup strategy for customer data?:

    @scottalanmiller said in Backup strategy for customer data?:

    Backups are always tough to discuss because they tend to be so dramatically dependent on the data in question and how customers are able to protect their own data. In some cases, you'd pretty much naturally expect an ability to replay any transaction for years, in others you'd only expect to be able to recover if the system failed and nearly everything in between.

    And then you have to consider archival data.

    I was thinking that we also need to provide a way for the customers to export and backup their own data as well. I would expect them to want that.

    That's often the case, and definitely a great feature.



  • @scottalanmiller said in Backup strategy for customer data?:

    At a minimum, I'd expect that backups are air gapped and no infection of the SaaS platform or customer system could flow back and also compromise the backups that have already been taken.

    How is that normally accomplished?



  • @Pete-S said in Backup strategy for customer data?:

    @scottalanmiller said in Backup strategy for customer data?:

    At a minimum, I'd expect that backups are air gapped and no infection of the SaaS platform or customer system could flow back and also compromise the backups that have already been taken.

    How is that normally accomplished?

    Tape



  • @scottalanmiller said in Backup strategy for customer data?:

    @Pete-S said in Backup strategy for customer data?:

    @scottalanmiller said in Backup strategy for customer data?:

    At a minimum, I'd expect that backups are air gapped and no infection of the SaaS platform or customer system could flow back and also compromise the backups that have already been taken.

    How is that normally accomplished?

    Tape

    Tape removed and taken off-site or just a backup to a tape in a tape loader?



  • @Pete-S said in Backup strategy for customer data?:

    @scottalanmiller said in Backup strategy for customer data?:

    @Pete-S said in Backup strategy for customer data?:

    @scottalanmiller said in Backup strategy for customer data?:

    At a minimum, I'd expect that backups are air gapped and no infection of the SaaS platform or customer system could flow back and also compromise the backups that have already been taken.

    How is that normally accomplished?

    Tape

    Tape removed and taken off-site or just a backup to a tape in a tape loader?

    Tape never stays in the loader 😉 Tape implies being removed. Maybe not taken off site, but definitely not left in the loader.





  • @Alex-Jones said in Backup strategy for customer data?:

    @scottalanmiller said in Backup strategy for customer data?:

    Tape

    Why Tape?

    Low cost, highly reliable, easily physically air gapped.







  • @Alex-Jones said in Backup strategy for customer data?:

    @scottalanmiller said in Backup strategy for customer data?:

    @Alex-Jones said in Backup strategy for customer data?:

    @scottalanmiller said in Backup strategy for customer data?:

    Tape

    Why Tape?

    Low cost, highly reliable, easily physically air gapped.

    https://blog.storagecraft.com/tape-backup-vs-hard-disk-backup-what-does-the-future-hold/

    The data there is quite old and hasn't played out as people might have expected. Tape capacities have grown much faster than those for hard drives, and it skips really, really important factors that apply to backups like portability, shelf stability, etc. That article talks about LTO5, but we are on LTO8 now. Speeds have improved and capacities have exploded. Hard disks have barely moved forward, though. So the tape vs hard drive comparison has moved heavily towards tape during that time.

    But way, way more important is the rise of ransomware. The idea with hard drive backup medium was that you no longer needed air gapped backups. There was a time period when it was common to think that hard disk always-online backups were the future. But that future never materialized because of the risk involved with ransomware that has now become possibly the most important factor for needing backups.

    Towards the end of the article you'll notice that he prices "several tapes" compared to a single hard drive. The article never considered the need to air gap backups at all, so was comparing apples to oranges. In the modern world, just using a single online hard drive would never be considered a valid backup mechanism. It doesn't meet the same need at all.

    @Steven and I both wrote for that publication in that era, BTW 🙂



  • There are virtual tape mechanisms that allow you to write to object storage (on disk) and gets it to act like tape. But disk storage remains so expensive, that at scale it often doesn't make sense. Sometimes it does, but tape is really hard to beat. If you have itty bitty storage needs, then tape rarely makes sense. But once you get close to the size of a tape, it's effectively unbeatable. The transfer speeds are so fast, and capacity so large, and ability to be easily taken offline and storage for incredible long periods of time without big worries from bouncing around in transit or having temperature changes that would hurt hard drives.

    Tape also doesn't use power when idle, whereas hard disks need constant power (and cooling) to stay healthy. Not that they use a lot, but it adds up when you have dozens or hundreds of hard drives spinning compared to tapes sitting on shelves.



  • When comparing tape, it's important to not look at raw capacity. LTO tape has hardware compression that is real time, on the fly and incredibly powerful. The compression ratios on tape are crazy. It's part of the sequential write mechanism. Hard drives don't offer this mechanism, nor could they because of the random access model. Tapes don't actually write raw, so an LTO8 is actually going to get 30TB on average. Sometimes less, sometimes more. But that's a real number to work with.

    Hard drives are still struggling with getting 12TB and 14TB drives out, that's less than half of the capacity. At $270 for a Seagate (that most people don't consider safe... although a lot of that is just opinion) at 12TB that's $410 more for 24TB of capacity compared to $60 total for 30TB with tape. It takes very few tapes / hard drives to pay for the tape loader.

    And as we showed in the other thread, a 12TB tape will be filled during a normal backup window, but a 12TB hard drive doesn't have enough time in a day to get written to. So isn't even a potential option for normal daily backups, even if it gets full performance to write at full speed all day, because it couldn't finish writing before the next day's backup would start (and that's for a single drive, let alone a set of them.)



  • Steven's article assumed that SSD would overcome spinners for backups. And that's definitely going to be true at some point in the future. But for now, per TB cost is still way higher for SSDs than spinners. Spinners won't even be on the market once SSDs are cheaper, and we don't expect that for a long time yet. So as long as tapes are cheaper than spinners, and spinners are cheaper than SSDs, SSDs rarely make good backup options.

    Once they do, we'll see the market flooded with "removable" SSD options to replace tape. Right now hard drives and SSD both suffer from lacking good "removable" options that make the constant plugging and unplugging not take a terrible toll on the connectors.

    To actually compare hard drives, you have to look at something like RDX. Hard drives for removable backup require the costly drives similar to what tapes need. So that cost isn't unique to tape. But the hard drive media is way more expensive for RDX than for non-removable hard drives. A 1TB RDX drive is $200 and a 4TB is $623. So removable hard drives aren't just more expensive per TB by a huge factor, but are much slower, too.



  • What are good options for "small" tapes setups, kinda like a 5, 6 or xx bay Synology for tape?



  • @FATeknollogee said in Backup strategy for customer data?:

    What are good options for "small" tapes setups, kinda like a 5, 6 or xx bay Synology for tape?

    Single drive, swap tapes. Anything small should fit on a single tape for a full backup.

    There are tape libraries available where you can swap out more than a single tape, but those aren't needed till you are talking very large data sets.



  • @scottalanmiller said in Backup strategy for customer data?:

    When comparing tape, it's important to not look at raw capacity. LTO tape has hardware compression that is real time, on the fly and incredibly powerful. The compression ratios on tape are crazy. It's part of the sequential write mechanism. Hard drives don't offer this mechanism, nor could they because of the random access model. Tapes don't actually write raw, so an LTO8 is actually going to get 30TB on average. Sometimes less, sometimes more. But that's a real number to work with.

    Hard drives are still struggling with getting 12TB and 14TB drives out, that's less than half of the capacity. At $270 for a Seagate (that most people don't consider safe... although a lot of that is just opinion) at 12TB that's $410 more for 24TB of capacity compared to $60 total for 30TB with tape. It takes very few tapes / hard drives to pay for the tape loader.

    And as we showed in the other thread, a 12TB tape will be filled during a normal backup window, but a 12TB hard drive doesn't have enough time in a day to get written to. So isn't even a potential option for normal daily backups, even if it gets full performance to write at full speed all day, because it couldn't finish writing before the next day's backup would start (and that's for a single drive, let alone a set of them.)

    In many cases, the backup software compresses your backups first.

    So it's important to look at how much raw data will get backed up, not just how much backed up data will make it onto tape... Because if you look at that, you'll find that you will only fit the raw uncompressed capacity value of the tape.

    Archiving a backup to tape versus archiving raw data to tape.

    In the end, it's about the same amount of raw data anyways.



  • @FATeknollogee said in Backup strategy for customer data?:

    What are good options for "small" tapes setups, kinda like a 5, 6 or xx bay Synology for tape?

    Pretty much you are stuck with LTO sizes. The primary purpose of tape is to physically disconnect it after use. So a single tape unit that someone pulls the tape from daily is the "small" setup. You can get external tape drives, or dedicated tape units. As you grow, you can move to robots and parallel write systems. Big enterprises basically all live and die by tapes and have giant robotic units that stream dozens of tapes at the same time and move them around in big libraries mechanically.



  • @Obsolesce said in Backup strategy for customer data?:

    @scottalanmiller said in Backup strategy for customer data?:

    When comparing tape, it's important to not look at raw capacity. LTO tape has hardware compression that is real time, on the fly and incredibly powerful. The compression ratios on tape are crazy. It's part of the sequential write mechanism. Hard drives don't offer this mechanism, nor could they because of the random access model. Tapes don't actually write raw, so an LTO8 is actually going to get 30TB on average. Sometimes less, sometimes more. But that's a real number to work with.

    Hard drives are still struggling with getting 12TB and 14TB drives out, that's less than half of the capacity. At $270 for a Seagate (that most people don't consider safe... although a lot of that is just opinion) at 12TB that's $410 more for 24TB of capacity compared to $60 total for 30TB with tape. It takes very few tapes / hard drives to pay for the tape loader.

    And as we showed in the other thread, a 12TB tape will be filled during a normal backup window, but a 12TB hard drive doesn't have enough time in a day to get written to. So isn't even a potential option for normal daily backups, even if it gets full performance to write at full speed all day, because it couldn't finish writing before the next day's backup would start (and that's for a single drive, let alone a set of them.)

    In many cases, the backup software compresses your backups first.

    So it's important to look at how much raw data will get backed up, not just how much backed up data will make it onto tape... Because if you look at that, you'll find that you will only fit the raw uncompressed capacity value of the tape.

    Archiving a backup to tape versus archiving raw data to tape.

    In the end, it's about the same amount of raw data anyways.

    That's true, but tape compression ratios take that into account to some degree. LTO's streaming hardware compression is a bit different than compression used other places. Getting both types isn't bad. It will lower the ratio, but if you have no compression whatsoever you'll get more than average as well.



  • @scottalanmiller said in Backup strategy for customer data?:

    @FATeknollogee said in Backup strategy for customer data?:

    What are good options for "small" tapes setups, kinda like a 5, 6 or xx bay Synology for tape?

    Pretty much you are stuck with LTO sizes. The primary purpose of tape is to physically disconnect it after use. So a single tape unit that someone pulls the tape from daily is the "small" setup. You can get external tape drives, or dedicated tape units. As you grow, you can move to robots and parallel write systems. Big enterprises basically all live and die by tapes and have giant robotic units that stream dozens of tapes at the same time and move them around in big libraries mechanically.

    It seems like the next step up from a single tape is something like the 1U Dell PowerVault TL1000 which has a tray with 9 tapes. So you can backup and then swap out up to 9 tapes at the same time. That's roughly 100 to 350 TB per backup with LTO-8 tapes. Around $7500 without tapes.



  • We've worked with a variety of hosting solution providers. Most start with a base of one backup done per 24 hours with a fee to restore if required.

    Some have a built-in backup feature that we can then set up for the VMs we have our cloud desktop clients running in. It can be set up to run relatively often. They charge a fee for that one.

    Start with once per day.

    As far as the "how" what is the underlying virtualization platform?

    Our hosting solutions are set up to use Veeam at the host level.

    StarWind's Virtual Tape Library (VTL) can be used to augment the backup in another DC with Veeam's Cloud Connect being another option to tie in to get the backup data out of the production DC.

    As far as expectations go, we're in the process of setting up a BaaS and DRaaS service based on Veeam. Backups and DR will be multi-site with one goal to be a two to four week no-delete option available.

    In our investigations of BaaS/DRaaS providers none were able, or wanted, to answer the, "How do you back up our backup data to protect against failures in your system?" question.



  • @PhlipElder said in Backup strategy for customer data?:

    We've worked with a variety of hosting solution providers. Most start with a base of one backup done per 24 hours with a fee to restore if required.

    Some have a built-in backup feature that we can then set up for the VMs we have our cloud desktop clients running in. It can be set up to run relatively often. They charge a fee for that one.

    Start with once per day.

    As far as the "how" what is the underlying virtualization platform?

    Our hosting solutions are set up to use Veeam at the host level.

    StarWind's Virtual Tape Library (VTL) can be used to augment the backup in another DC with Veeam's Cloud Connect being another option to tie in to get the backup data out of the production DC.

    As far as expectations go, we're in the process of setting up a BaaS and DRaaS service based on Veeam. Backups and DR will be multi-site with one goal to be a two to four week no-delete option available.

    In our investigations of BaaS/DRaaS providers none were able, or wanted, to answer the, "How do you back up our backup data to protect against failures in your system?" question.

    As we are are getting into SaaS and not infrastructure, I think our primary concern are being able to restore the customers data in case something bad happens that's our fault or responsibility - for instance software bugs, hackers, ransomware, multiple hardware failures etc.

    We are not as concerned with being able to restore the customers data in case they screw up, as we are if we screw up. That said, if we can without to much investment, we might be able to add something here. Have to think about that one. In either case we will provide some way for the customer to export and backup their data.

    For now we run on xen (xcp-ng). The goal is to be able to restore the infrastructure with automation, so I don't expect us to really need a lot of host based backups. We have a lot more testing to do on this.

    From what I can gather right now, I think we will backup to disk storage on-prem. Then from there we will go to tape. Tape will be moved off site once a week. We will do incremental backups to the cloud or another site so we can restore completely using off-site tape and the incremental backups.

    This will allow us to restore from on-prem disk in most cases. If we are hacked or infected we can restore from on-site tape. In case of a fire or something we can restore from off site tape and incremental backups.



  • @Pete-S said in Backup strategy for customer data?:

    @PhlipElder said in Backup strategy for customer data?:

    We've worked with a variety of hosting solution providers. Most start with a base of one backup done per 24 hours with a fee to restore if required.

    Some have a built-in backup feature that we can then set up for the VMs we have our cloud desktop clients running in. It can be set up to run relatively often. They charge a fee for that one.

    Start with once per day.

    As far as the "how" what is the underlying virtualization platform?

    Our hosting solutions are set up to use Veeam at the host level.

    StarWind's Virtual Tape Library (VTL) can be used to augment the backup in another DC with Veeam's Cloud Connect being another option to tie in to get the backup data out of the production DC.

    As far as expectations go, we're in the process of setting up a BaaS and DRaaS service based on Veeam. Backups and DR will be multi-site with one goal to be a two to four week no-delete option available.

    In our investigations of BaaS/DRaaS providers none were able, or wanted, to answer the, "How do you back up our backup data to protect against failures in your system?" question.

    As we are are getting into SaaS and not infrastructure, I think our primary concern are being able to restore the customers data in case something bad happens that's our fault or responsibility - for instance software bugs, hackers, ransomware, multiple hardware failures etc.

    We are not as concerned with being able to restore the customers data in case they screw up, as we are if we screw up. That said, if we can without to much investment, we might be able to add something here. Have to think about that one. In either case we will provide some way for the customer to export and backup their data.

    For now we run on xen (xcp-ng). The goal is to be able to restore the infrastructure with automation, so I don't expect us to really need a lot of host based backups. We have a lot more testing to do on this.

    From what I can gather right now, I think we will backup to disk storage on-prem. Then from there we will go to tape. Tape will be moved off site once a week. We will do incremental backups to the cloud or another site so we can restore completely using off-site tape and the incremental backups.

    This will allow us to restore from on-prem disk in most cases. If we are hacked or infected we can restore from on-site tape. In case of a fire or something we can restore from off site tape and incremental backups.

    There are some keys to providing a customer facing solution:

    • Customer facing network(s) are not in any way connected to the hosting company's day to day network (DtDN)
    • Privileged Access Workstation structures are in place to keep things separate
    • Backups are air-gapped in some way to protect against catastrophic failure or encryption event
    • Customer resources are on separate equipment from DtDN

    Ultimately, the entire solution set for DtDN, Support, and Customer Facing networks should be segmented completely from each other with significant protections in place to keep them that way.

    • iNSYNQ
    • 2019-07-27 Twitter iNSYNQ.PNG
    • 2019-07-29 Twitter - iNSYNQ.PNG
    • Wolters Kluwer/CCH
    • Maersk
    • PCM
    • WiPro
    • Hosting company (UK 123 something?) lost everything due to backups being wiped
    • Secure mail hosting company lost everything when perp took everything out right through the backups
    • ETC


  • @PhlipElder said in Backup strategy for customer data?:

    hosting company's day to day network

    With day to day network, do you mean the hosting company's own internal IT, for managing their own company?
    Or do you mean the hosting company's management network for managing the hosting infrastructure?



  • @Pete-S said in Backup strategy for customer data?:

    @PhlipElder said in Backup strategy for customer data?:

    hosting company's day to day network

    With day to day network, do you mean the hosting company's own internal IT, for managing their own company?
    Or do you mean the hosting company's management network for managing the hosting infrastructure?

    DtDN = Sales, HR, Financing, ETC where folks blindly click on things and get hit by drive-by web sites.

    Management would be with PAW (Privileged Access Workstation) and segmented away from the DtDN with absolutely no crossover between them.



  • @scottalanmiller said in Backup strategy for customer data?:

    When comparing tape, it's important to not look at raw capacity. LTO tape has hardware compression that is real time, on the fly and incredibly powerful. The compression ratios on tape are crazy. It's part of the sequential write mechanism. Hard drives don't offer this mechanism, nor could they because of the random access model. Tapes don't actually write raw, so an LTO8 is actually going to get 30TB on average. Sometimes less, sometimes more. But that's a real number to work with.

    Question: Am I correct in assuming that this compression doesn't offer any benefit where the backup content is video media? If it DOES allow compression of video files, how good is the compression ratio?



  • @NashBrydges said in Backup strategy for customer data?:

    Question: Am I correct in assuming that this compression doesn't offer any benefit where the backup content is video media? If it DOES allow compression of video files, how good is the compression ratio

    That depends. But generally it does, but relatively little. You likely still want it on (especially on tape) because the compression mechanism normally speeds the writes to and from the media because it is compressed in real time. But heavily compressed video media is going to get very little additional compression, but generally some.


Log in to reply