Backups in the DevOps World



  • Traditionally, we've thought of performing disaster recovery as taking an image, or a near image, of the complete filesystem of a server that we need to restore and restoring it bit by bit to a new system. This could be done the very "old fashioned" way by having to install a stock operating system then restoring the "differences" from the backup system which is often quite a lot as all patches and changes since the system was first created are likely to be in the backup - this is generally a "file based" restore. The more recently approach of taking a full system image of a virtual machine (or less commonly of a physical server) and restoring an entire server - this is a "block based" restore.

    In both of these cases we have two major underlying issues. One is that this is a restore operation is not a part of normal operations and so must be intentionally scheduled for testing to have any hope of having been tested. Nothing in necessary operations will test restores by default. The second is that the amount of data to be restored is very large and therefore will take time to get files back into place.

    In the DevOps world, with software defined infrastructure, backups have an opportunity to be treated very differently. Building a new server for a task and restoring an existing one because two ways of describing the same operation. Building a server becomes a normal operational task that can easily happen all of the time. Suddenly the base system, including the entire operating system, applications, patches and so forth become moot as they are automatically created, as needed, via automation.

    This, in fact, creates two classes of servers. Stateless and stateful. Normally a database or file server would be stateful - there is data on it that needs to be protected. Application servers would be stateless - there is nothing stored on them that is needed.

    Restoring a stateless system requires no "restore" whatsoever - just a capacity expansion. A stateless system can be recreated, from scratch, via the build automation system. This might require manual intervention or a capacity detection system could decide automatically that capacity is too low and "restore" capacity in that manner. The idea of backups for stateless systems can go away completely.

    Restoring a stateful system does require a backup of the data, but the majority of the system is still stateless. The OS and applications can still be built in an automated way and then only the data that should be store there needs to be pulled from a backup system. This would be the data in a database or the files on a file server, for example.

    Consider how small the standard size of a database file is, especially when compressed, compared to the size of the full OS and applications. In the SMB, we'd typically see sizes much smaller than the OS, but not always, of course. A backup that the traditional way might require 40GB or more for a small system might be only a few GB for a database and zero for the application(s) that use it. An extremely busy web application of applications servers, database servers, proxies, load balancers and more that today might easily take several hundred GB to image might require only 800MB for a full backup in a DevOps mode.

    DevOps thinking really changes backups more than most aspects of systems planning. Look around at your own environment. What if all of your systems could be restore at the press of a button to running state and only need their volatile data restored via a restoration process. How small might your backups be? How much faster and how much less costly and how much more reliable might they be?



  • Definitely an interesting concept.

    Especially in the license-free world of Linux.

    Much easier to separate data and applications into multiple servers when you can just stand up a new server when you want it.



  • @BRRABill said in Backups in the DevOps World:

    Much easier to separate data and applications into multiple servers when you can just stand up a new server when you want it.

    None, really. It's good practice to separate workloads, but not separating workloads doesn't cause storage to sprinkle throughout the OS in a way it does not when the workloads are separate. Workload divisions by VM would have no directly impact on DevOps backups.

    So this applies absolutely equally regardless of OS or licensing.



  • @scottalanmiller said

    So this applies absolutely equally regardless of OS or licensing.

    I don't agree with that.

    If that was the case, you'd probably never see a DC with anything else on it.



  • @BRRABill said in Backups in the DevOps World:

    @scottalanmiller said

    So this applies absolutely equally regardless of OS or licensing.

    I don't agree with that.

    I must be missing something. I can think of no factor that would apply.



  • @BRRABill said in Backups in the DevOps World:

    If that was the case, you'd probably never see a DC with anything else on it.

    I'm unclear what this means.



  • @scottalanmiller said in Backups in the DevOps World:

    @BRRABill said in Backups in the DevOps World:

    If that was the case, you'd probably never see a DC with anything else on it.

    I'm unclear what this means.

    My point is that we often see multiple things on a Windows DC. Best case is to have it by itself. Much easier to restore. AKA your point here. I'd venture to say a lot of this is a license issue.

    No?



  • @BRRABill said in Backups in the DevOps World:

    My point is that we often see multiple things on a Windows DC. Best case is to have it by itself. Much easier to restore. AKA your point here. I'd venture to say a lot of this is a license issue.

    No?

    But my point was seeing multiple things on one machine isn't a factor to the point at hand. And that having them separated doesn't change the discussion. You gave an example of a DC to counter that, but I don't understand what aspect of a DC you feel changes how the storage and system files are separated.

    And given that I can't find any reason why systems being together in one VM image make a difference, that in turn means that the licensed makes no difference at all. So unless I can understand the former, I have no idea why the latter comes into play.



  • @scottalanmiller said in Backups in the DevOps World:

    @BRRABill said in Backups in the DevOps World:

    My point is that we often see multiple things on a Windows DC. Best case is to have it by itself. Much easier to restore. AKA your point here. I'd venture to say a lot of this is a license issue.

    No?

    But my point was seeing multiple things on one machine isn't a factor to the point at hand. And that having them separated doesn't change the discussion. You gave an example of a DC to counter that, but I don't understand what aspect of a DC you feel changes how the storage and system files are separated.

    And given that I can't find any reason why systems being together in one VM image make a difference, that in turn means that the licensed makes no difference at all. So unless I can understand the former, I have no idea why the latter comes into play.

    I am arguing that not having to worry about licensing makes it easier to best practice data and application loads if needed.

    Want to blow away your malfunctioning Unifi controller and make a new one? Backup the config, make a new VM, reinstall, import config. Done. Not so easy with all sorts of stuff on a server.

    The DC example is many times people back up the whole thing because there are applications on there, and DHCP, and data, and everything. How many times have we seen this on ML?

    Best case is just having a DC. If it goes haywire? Don't restore. Set up a new one. Isn't that best practice?



  • @BRRABill said in Backups in the DevOps World:

    I am arguing that not having to worry about licensing makes it easier to best practice data and application loads if needed.

    Right, and I pointed out that this could not be the case. Does Windows licensing encourage bad practices as regards separating workloads? Yes. Does that have anything to do with the situation being discussed, no.

    Why are you mentioning here, is where I am confused.



  • @BRRABill said in Backups in the DevOps World:

    Want to blow away your malfunctioning Unifi controller and make a new one? Backup the config, make a new VM, reinstall, import config. Done. Not so easy with all sorts of stuff on a server.

    Now you are talking about the granularity of the restore, not the separation of the system and the data. That's a great point and very valid, but not what we are discussing and doesn't change the volume of type of backups.

    The one thing that it would do heavily, is destroy the idea of image based backups or "agentless" backups and make file backups and DevOps style backups far, far more important and they can restore "by service" rather than "by system."



  • @BRRABill said in Backups in the DevOps World:

    The DC example is many times people back up the whole thing because there are applications on there, and DHCP, and data, and everything. How many times have we seen this on ML?

    I'm missing the point. Lots of people don't use DevOps style backups today. Of course not. But they could be, and that's the point.



  • @BRRABill said in Backups in the DevOps World:

    Best case is just having a DC. If it goes haywire? Don't restore. Set up a new one. Isn't that best practice?

    No. Not a best practice. It's a good practice under certain conditions - conditions under which you would not be restoring from backup because the system is not down. You go to backups when the system is down. Your way only works when the cluster degraded but still functional.



  • DevOps style backups matter a bit because as we move to a world that wants offsite backups more and more the difference between trying to backup, or more importantly restore, 1TB or data or 10GB of data is huge. Not just in time, but in cost. Storing 10GB on Amazon S3 is trivial, a TB is far worse. And needing to download large traditional images means huge delays that might easily make restoring systems impractical, when pulling down a small database file that is compress might be a few minutes.



  • @BRRABill said in Backups in the DevOps World:

    @scottalanmiller said

    So this applies absolutely equally regardless of OS or licensing.

    I don't agree with that.

    If that was the case, you'd probably never see a DC with anything else on it.

    You should literally never see a DC with anything else on it... that goes against best practices and Microsoft recommendations. If you're running Microsoft you are already buying into the costs. You know that it is going to cost money to run it well and correctly.



  • @coliver said in Backups in the DevOps World:

    @BRRABill said in Backups in the DevOps World:

    @scottalanmiller said

    So this applies absolutely equally regardless of OS or licensing.

    I don't agree with that.

    If that was the case, you'd probably never see a DC with anything else on it.

    You should literally never see a DC with anything else on it... that goes against best practices and Microsoft recommendations. If you're running Microsoft you are already buying into the costs. You know that it is going to cost money to run it well and correctly.

    That was my point.

    I spoke to @scottalanmiller offline about this yesterday. I think we were just arguing the wrong point.

    My point was that it's much easier to just stand up a VM with individual stuff on it. It makes it easier to get it back up and running in the case of issues. His point was that has nothing to do with the data backup.

    So, I think we were both right.

    I agree on the DC, but how many times (I myself am guilty of this) do we see a DC with a bunch of other stuff on it? All the time. If Windows Server was free, that probably would be less of the case.



  • @BRRABill said in Backups in the DevOps World:

    @coliver said in Backups in the DevOps World:

    @BRRABill said in Backups in the DevOps World:

    @scottalanmiller said

    So this applies absolutely equally regardless of OS or licensing.

    I don't agree with that.

    If that was the case, you'd probably never see a DC with anything else on it.

    You should literally never see a DC with anything else on it... that goes against best practices and Microsoft recommendations. If you're running Microsoft you are already buying into the costs. You know that it is going to cost money to run it well and correctly.

    That was my point.

    I spoke to @scottalanmiller offline about this yesterday. I think we were just arguing the wrong point.

    My point was that it's much easier to just stand up a VM with individual stuff on it. It makes it easier to get it back up and running in the case of issues. His point was that has nothing to do with the data backup.

    So, I think we were both right.

    I agree on the DC, but how many times (I myself am guilty of this) do we see a DC with a bunch of other stuff on it? All the time. If Windows Server was free, that probably would be less of the case.

    That's true, but we already have a FOSS option for a majority of the tools Windows provides. If people were going to follow best practices then the licensing part would never come into it.



  • @coliver said

    That's true, but we already have a FOSS option for a majority of the tools Windows provides. If people were going to follow best practices then the licensing part would never come into it.

    If you are going that direction, there is probably FOSS for everything windows provides in most circumstances, no?



  • @coliver said in Backups in the DevOps World:

    @BRRABill said in Backups in the DevOps World:

    @scottalanmiller said

    So this applies absolutely equally regardless of OS or licensing.

    I don't agree with that.

    If that was the case, you'd probably never see a DC with anything else on it.

    You should literally never see a DC with anything else on it... that goes against best practices and Microsoft recommendations. If you're running Microsoft you are already buying into the costs. You know that it is going to cost money to run it well and correctly.

    That's a very important thing to remember. Windows is an awesome product, but it is a cost premium and if you are considering it, then licensing should be factored into the consideration. If you can't afford to run it, you shouldn't run it, it's that easy.



  • @BRRABill said in Backups in the DevOps World:

    @coliver said

    That's true, but we already have a FOSS option for a majority of the tools Windows provides. If people were going to follow best practices then the licensing part would never come into it.

    If you are going that direction, there is probably FOSS for everything windows provides in most circumstances, no?

    Yes, and that brings up the point of the other topic, what value does MS actually bring to the table. Other then making it easier for IT Pros to mess up best practices. 🙂



  • @BRRABill said in Backups in the DevOps World:

    @coliver said

    That's true, but we already have a FOSS option for a majority of the tools Windows provides. If people were going to follow best practices then the licensing part would never come into it.

    If you are going that direction, there is probably FOSS for everything windows provides in most circumstances, no?

    Correct. AD, DNS, DHCP, Relational Database, Email, you name it. The only thing generally lacking is specific support for things like applications that will only run on Windows that people want to use.



  • @coliver said in Backups in the DevOps World:

    @BRRABill said in Backups in the DevOps World:

    @coliver said

    That's true, but we already have a FOSS option for a majority of the tools Windows provides. If people were going to follow best practices then the licensing part would never come into it.

    If you are going that direction, there is probably FOSS for everything windows provides in most circumstances, no?

    Yes, and that brings up the point of the other topic, what value does MS actually bring to the table. Other then making it easier for IT Pros to mess up best practices. 🙂

    I mean in theory most SMBs, who are now probably heavily MS shops, could start over, and basically accomplish the same things with FOSS. Unfortunately (IMO) the knowledge is just not there.



  • @BRRABill said in Backups in the DevOps World:

    @coliver said in Backups in the DevOps World:

    @BRRABill said in Backups in the DevOps World:

    @coliver said

    That's true, but we already have a FOSS option for a majority of the tools Windows provides. If people were going to follow best practices then the licensing part would never come into it.

    If you are going that direction, there is probably FOSS for everything windows provides in most circumstances, no?

    Yes, and that brings up the point of the other topic, what value does MS actually bring to the table. Other then making it easier for IT Pros to mess up best practices. 🙂

    I mean in theory most SMBs, who are now probably heavily MS shops, could start over, and basically accomplish the same things with FOSS. Unfortunately (IMO) the knowledge is just not there.

    Which is why most SMBs probably shouldn't have IT people onsite or really any onsite infrastructure. But that's another topic entirely.



  • @BRRABill said in Backups in the DevOps World:

    @coliver said in Backups in the DevOps World:

    @BRRABill said in Backups in the DevOps World:

    @coliver said

    That's true, but we already have a FOSS option for a majority of the tools Windows provides. If people were going to follow best practices then the licensing part would never come into it.

    If you are going that direction, there is probably FOSS for everything windows provides in most circumstances, no?

    Yes, and that brings up the point of the other topic, what value does MS actually bring to the table. Other then making it easier for IT Pros to mess up best practices. 🙂

    I mean in theory most SMBs, who are now probably heavily MS shops, could start over, and basically accomplish the same things with FOSS. Unfortunately (IMO) the knowledge is just not there.

    The knowledge is there if they went to better support models 🙂 Lacking the necessary skills is another mistake that lots of SMBs make. It's layer upon layer of mistakes (some business, some tech) that lead to these situations.

    That's not to say that Windows is never right, it's an excellent product with good use cases. But it's overused to an astounding degree because 99% of the time, no one is evaluating needs at all. And that means there is no IT being done, only "buying".



  • mmm... I think DevOps is not related to back up a stateless or stateful OS. I mean: any OS is stateless and apps can run with a bunch of config files setup. having a proper data backup solution decoupled from configs and from OS make things simple.

    This is what I'm used to do:

    1- OS: (ok the entire VM) backup now and then like 1 in a week, just in case something destroys the functionality
    2- apps config: track changes wherever you want, from txt to a proper DevOps playbook (something simply I don't do)
    3- data: backup as fast as possible. <- THIS IS ACTUALLY THE STATEFUL PART.

    I'm really a fan of application level backup. I mean that if you have a DB let the DB backup its own data, do not do cool differential hourly VM image backups with huge retention windows. NO, never. I dislike this. Do data backups instead: they are simply more efficent.

    In case of disaster recovery what I try to do is:
    1- restart from the latest OS point,
    2- if nothing in the config is changed in from last OS backup (trak your config changes) this is ok, else patch config.
    3- Then restore latest dataset, no differential stuff, simply erase data and restore them.

    What I see in DevOps is that you should script points 1 and 2 without use the latest OS point: create a plain VM image and do all the config programmatically so that you do not need to restore anything in the VM/app setup: simply create the VM from scratch everytime like you would do with vagrant!

    then you still need to add the "stateful" bit if you have it, that is, restore the dataset, via app native procedures.

    Unfortunately it seems that this solution (app level backup) is not perceived as a good practice, as anyone wants huge differential VM backups at block level. I think because they are easier to manage in case of hurry: just apply some automatic procedure to restore blocks as a huge "file", don't mind about internals.



  • @matteo-nunziati said in Backups in the DevOps World:

    mmm... I think DevOps is not related to back up a stateless or stateful OS. I mean: any OS is stateless and apps can run with a bunch of config files setup. having a proper data backup solution decoupled from configs and from OS make things simple.

    Of course, but didn't you just define DevOps 😉



  • @matteo-nunziati said in Backups in the DevOps World:

    I'm really a fan of application level backup. I mean that if you have a DB let the DB backup its own data, do not do cool differential hourly VM image backups with huge retention windows. NO, never. I dislike this. Do data backups instead: they are simply more efficent.

    Same here. No one knows how to get data out of your database like the database 🙂



  • @matteo-nunziati said in Backups in the DevOps World:

    Unfortunately it seems that this solution (app level backup) is not perceived as a good practice....

    It's the only thing that I've ever really seen in the enterprise. The idea of anything else is much more an SMB thing. At least in my experience.



  • @scottalanmiller what I wanted to say is that DevOps (as a technique, do not mind about the phylosophy) is mostly about scripting everything from the ground up.

    Stateless/stateful is mostly a mindset in back up: restore a VM is always stateless, the stateful bit is if you have to restore datasets in it, which is, to my opinion, a second and decoupled step.

    even security patches is mostly of a nightmare on windows but just an apt/yum away in linux.

    very ofter people refer to stateless VMs/containers when they can simply respawn in a cloud env when they die, but provisioning (e.g. setting up right config files) is always there, even in a docker conf file.

    Let state this differently. My way of "old" disaster recovery:
    1- create a new VM
    2- install the OS on it
    3- setup all the required config/services

    optionally for stateful servers/VMs/containers:
    4- restore the latest data snapshot

    in DevOps steps 1 to 3 are still there but you use a deploy/provision script being it vagrant+ansible or something more complex.

    Maybe step 4 can even be scripted here. but DevOps is this: just script everything.

    The added value of this is that you can re-run the script as you want and if a local basis is available (say local ISO, local cache for apt/yum), things can be really fast. This lowers the barrier to testing disaster scenarions.

    But a lot of infrastructure must be there. and discipline/quality insurrance. Both of which very often an SMB lacks.

    the same can be obtained on a physical machine if you keep images with tools like clonezilla etc...



  • It's true that you can make stateless systems without DevOps tooling and approaches. But the nature and assumptions of those systems is that you cannot. Just letting arbitrary logins (even of administrators) can undermine that. One of the beauties of the pure DevOps model is the lack of logins. Much like functional programming.