When a Backup Becomes an Archive



  • We all know what a backup is, right? Or do we. Sometimes what is and is not a backup can become a little convoluted. A quick look in the dictionary gives us the following computing definition of a backup: "a copy of a file or other item of data made in case the original is lost or damaged." Sounds reasonable.

    0_1485922498822_Screenshot from 2017-02-01 05-13-25.png

    In the most common usage we know that we have original data, say a file on our desktop, and if we make a copy of it to another location that copy is a backup. Pretty reasonable. No arguments here.

    But, what happens should the original file become damaged or lost? Is the backup file still a backup? Well, not really. The idea of a copy in the digital world is a bit different than we are used to in the analogue world. Because a digital copy is exact what was our backup file has now, technically, really become the source file. There is no copy, only the "original" in a different place. Remember that the concepts of copying and moving files are a big different in computer terms than in the analogue world.

    This is a very important concept to understand - that the moment that our original file is destroyed that our backup becomes our new original and is no longer a backup. Of course in common parlance someone might say "restore the file from backup" and we know that they mean to make a new copy from what was the backup and is current the source but once restored to consider the latest copy to be the original rather than thinking of it as yet another backup. Gets confusing quickly, doesn't it?

    Part of the confusion arises from the fact that in reality, one copy is not the source and one is not the backup, but instead both digital copies are peers as they are identical. We consider one to be a copy and one to be the original based on intended use or on the style of media that they are stored. This is handy for discussing our intentions, but confusing at other times.

    When this becomes most confusing is when we introduce the complication of intentional source destruction. Take a spreadsheet that tracks how much coffee each person in the office drinks every day. Due to an odd government regulation, this spreadsheet is required to be available for auditing with reasonable notice and by law must be backed up. Seems simple.

    So we talk to Rebbecca who is in charge of this file. She creates the original spreadsheet on her desktop and a backup was sent to a tape that goes into a safe. We have an original on her desktop, and a backup in a safe on a tape. Simple. Rebbecca asks the backup administrator "is the file backed up?" The backup administrator says "Yes, yes it is. I made sure of it myself."

    Knowing that the file is safely backed up and protected in the safe and wanting to make sure that her desktop does not become cluttered with files that she no longer uses, Rebbecca deletes the spreadsheet from her desktop. Now the question becomes is there still a backup?

    The simple answer is, no. The file sitting on the tape in the safe is now the only copy of the file. A singular copy cannot be backed up. That file on the tape is now the source file. If the government agency came to inspect the file they would indeed find the file that they need, but if they audited to see if that file was properly backed up as required they would discover that no, no backup of the file exists - only a single copy and now there is a data protection violation.

    If that tape were to fail or become lost and an audit occurred the agency would be expected to check that proper data protection requirements were met. Of course no matter how many copies of a file may exist there always remains some possibility for all of them to become damaged or lost before additional copies can be made, so there is always a leniency for happenstance. But in this case if the tape was lost it would be the fault of no backup having been made.

    This may seem like a strange example, but it is a very real world one and one that I have seen with grave consequences as government agencies often truly due have long data retention rules combined with stringent backup and data protection rules. It is a very common business practice to take regular "backups" but then, knowing that files have been sent to a backup process, over time clean up production systems to only maintain those files that are currently useful. They do not think about the fact at all that the files that they are intentionally deleting might be part of a regulatory "backup set". Of course taking multiple backups can solve this problem, but these things can be more difficult than they seem.

    When we remove our source copy of data from production "online" storage and retain only a copy on "near line" or "offline" storage, that singular copy is not a backup, it is an archive - a source copy on slower, less costly media. Archives are not backups and should not be thought of as such.

    For most businesses, old data that is not expected to be needed ever again does not actually need to be kept in multiplicity for protection. This would often be overkill and impractical. But we should not consider the copy that remains as a backup because that is confusing, it is simply an archive. Any data that is a backup would be, logically, safe to destroy because the primary copy would still exist.

    The simple test for "is it backed up" is this: if you can destroy the file you are considering and still retain the ability to restore completely it had a backup. If, at any time, the destruction of a file results in the loss of that file, then it was not backed up. Consider this test when thinking about file retention and protection. And think of the set of the copies of a file to be a "backup set" together, as equal peers, rather than as a source and backup pair where one is seemingly different than the other.



  • In the real world, this was brought to my attention long ago with financial data required for reporting to the SEC. The SEC required all data be kept for seven years and had to be backed up for the full seven year period.

    The departments creating the data would do so on a server that only had enough capacity to hold one day of data. Every night the server was "backed up" to tape and, once the backup process was completed, the data on the system was erased to make room for the next day's data to be collected.

    This process became complicated because tapes were used and while there was a grandfather - father - son tape mechanism, only the grandfather set was kept for the requisite seven year retention period. The fathers were kept for one year and the sons for one or maybe two months. The grandfathers were monthly, the fathers weekly and the sons daily.

    This meant that each backup run caught unique data from just that one day. So the data on each son was totally unique from each other son, and each father and each grandfather. Because the source data was deleted daily, never was the same data ever caught by two different backup jobs. This meant that there was never, ever a single backup. There was only a transfer of the source file from spinning disks to tape - an archive process.

    So from the start, there was not even an attempt made at making a backup, only a transfer to an archive. At no time were the individual tapes duplicated separately. All data had only one copy of itself.

    But remember the G-F-S scheme. In any 28 day period (the schedule was lunar months, 13 cycles per year) only one of the 28 tapes would be stored for the required seven years - only the singular grandfather. Three father tapes would be kept for one year. And 24 son tapes would be kept for two months. So, assuming no tape ever failed, only 1/28th of the required data to be retained had even a hope of making it the seven year period and even if it survived, it never met the requirement of being backed up.

    Because the term "backup" was used by the backup team sending the data to tape (and to them it was a backup, for that brief moment there were two copies - the backup team had no way to know that the application owners were intentionally deleting their source data immediately after the backup had run) the application people figured that their obligations were met and "tossed the responsibility over the wall" to the backup team and thought no more of it. The backup team, knowing what a backup meant, assumed that the application team would never delete anything intentionally as legally they were required to keep every file they ever created on their system for seven years.

    It was not until a server filled up faster than normal and a day's set of files were deleted early before any archival to tape was done at all and then someone needed to recover a file and found that there was no backup, no source, no archive, no attempt at data retention that a discussion happened and suddenly yours truly realized that the application team had been intentionally destroying their "backups" to save space and had ignored the well documented information from the backup team that all data had to be kept on live production for a minimum of 28 days in order to be backup up for seven year retention and that it had to be stored for a minimum of 56 days to be stored for seven years with a backup copy as well. Otherwise, the data had to be kept on the origin system indefinitely to accommodate this. At this point the application team's idea of storing data for "almost" one day at most become rather untenable.

    Quite the discovery. The conversation involved the application team saying things like "what'll the SEC have to say when they find out you are not maintaining every backup for seven years?" Followed by "probably not much as they'll be busy talking with you about the intentional destruction of evidence." That was the last that they tried to blame other teams for a failure to retain backups.



  • Way to flip it back on them.



  • @Dashrender said in When a Backup Becomes an Archive:

    Way to flip it back on them.

    Well, they tried to blame system administration who had nothing to do with either side. We weren't the backup guys, nor were we the app team. We were the one party that had zero responsibility. The backup team, though, documented everything and had clear rules to follow. There was no one at fault, whatsoever, except for the application team who were the very ones upset that they lost data that they themselves had intentionally destroyed! So they had it coming.



  • @scottalanmiller said in When a Backup Becomes an Archive:

    @Dashrender said in When a Backup Becomes an Archive:

    Way to flip it back on them.

    Well, they tried to blame system administration who had nothing to do with either side. We weren't the backup guys, nor were we the app team. We were the one party that had zero responsibility. The backup team, though, documented everything and had clear rules to follow. There was no one at fault, whatsoever, except for the application team who were the very ones upset that they lost data that they themselves had intentionally destroyed! So they had it coming.

    Wow - it's kind of amazing that backup had documentation that awesome - I mean of course they should have documentation that spells out the responsibility of each side exactly, but I've rarely seen it.



  • @Dashrender said in When a Backup Becomes an Archive:

    @scottalanmiller said in When a Backup Becomes an Archive:

    @Dashrender said in When a Backup Becomes an Archive:

    Way to flip it back on them.

    Well, they tried to blame system administration who had nothing to do with either side. We weren't the backup guys, nor were we the app team. We were the one party that had zero responsibility. The backup team, though, documented everything and had clear rules to follow. There was no one at fault, whatsoever, except for the application team who were the very ones upset that they lost data that they themselves had intentionally destroyed! So they had it coming.

    Wow - it's kind of amazing that backup had documentation that awesome - I mean of course they should have documentation that spells out the responsibility of each side exactly, but I've rarely seen it.

    When you have a full backup department with only one job, it tends to be pretty good.



  • @Dashrender said in When a Backup Becomes an Archive:

    Way to flip it back on them.

    Did you drop an imaginary mike afterwards?



  • @BRRABill said in When a Backup Becomes an Archive:

    @Dashrender said in When a Backup Becomes an Archive:

    Way to flip it back on them.

    Did you drop an imaginary mike afterwards?

    Knowing Scott, he did. Just in his own way.



  • @Dashrender said in When a Backup Becomes an Archive:

    @BRRABill said in When a Backup Becomes an Archive:

    @Dashrender said in When a Backup Becomes an Archive:

    Way to flip it back on them.

    Did you drop an imaginary mike afterwards?

    Knowing Scott, he did. Just in his own way.

    Knowing Scott he probably had a real one in his pocket, which he pulled out, then dropped.