ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Recovery Time Objectives - How can I come up with a real world number...

    Scheduled Pinned Locked Moved IT Discussion
    32 Posts 6 Posters 5.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scottalanmillerS
      scottalanmiller @coliver
      last edited by

      @coliver said in Recovery Time Objectives - How can I come up with a real world number...:

      My assumption was that this is an accounting thing where you define how long you can afford to be down for and design a system that can recover by, or before, that time period.

      That's one of the most dangerous business myths around IT. That there are these "lines" to be drawn. Like "we can be down one hour, but not two." It's completely not reflective of real life. If you say that "you cannot be down for two hours", you imply that it is worth one penny short of the entire potential value of the business to protect against a two hour outage. Obviously, that's absurd. But that is what that statement tells the IT department.

      All disaster prevention and recovery is based around cost for protection. The more protection you want, the more it cost. How much it cost to be down and what the risk aversion is are business decisions. How that translates into usable RPO/RTO is that defined by IT based on those numbers. Otherwise, totally insane things happen like spending $100K to protect against a $5K outage.

      1 Reply Last reply Reply Quote 0
      • scottalanmillerS
        scottalanmiller
        last edited by

        In the real world companies lose money by the hour. No viable company can't be down for hours or days, most can be down for weeks or months. Not that it wouldn't hurt, but they can be and still survive. The "we can't be down for more than X" idea makes no sense because it basically says "don't bother recovering faster than this because we aren't saying that there is any value" and then "don't bother trying to recover if you can't make this line because we will be out of business." No business loses nothing for a day, then suddenly goes out of business taking all of their losses in one second.

        1 Reply Last reply Reply Quote 1
        • coliverC
          coliver
          last edited by

          I guess I mis-worded my original statement. Or didn't write it appropriately. I assumed that the cost of downtime vs the cost of a solution would be taken into account when defining the RTO. Although you've cleared it up significantly.

          1 Reply Last reply Reply Quote 0
          • C
            Carnival Boy
            last edited by

            To answer the original question: How can I come up with a real world number...

            You can't. Business systems are too complex to come up with a single figure. And disasters are always too unpredictable. The exercise is a bullshit marketing job to convince someone to spend some money.

            IMHO 🙂

            scottalanmillerS 1 Reply Last reply Reply Quote 1
            • scottalanmillerS
              scottalanmiller @Carnival Boy
              last edited by

              @Carnival-Boy said in Recovery Time Objectives - How can I come up with a real world number...:

              To answer the original question: How can I come up with a real world number...

              You can't. Business systems are too complex to come up with a single figure. And disasters are always too unpredictable. The exercise is a bullshit marketing job to convince someone to spend some money.

              IMHO 🙂

              I agree, it's not something that I think IT should be doing at all. You get numbers, you make a reasonable investment. You might have some guess as to recovery times which are useful for triage (like does it take one hour or ten hours to get systems back off of the tape) but RTO/RPO are just silly. In all my years I've never had an occasion to use them.

              1 Reply Last reply Reply Quote 0
              • DustinB3403D
                DustinB3403
                last edited by

                See I can agree with that @Carnival-Boy except that RTO in my mind should be the expected amount of time to recover from any of the possible scenarios out there.

                IE. Restoring an individual file shouldn't take more than a few minutes.

                I'm just trying to put some real world time down for some of the realistic events that might occur, but even that seems difficult.

                scottalanmillerS dafyreD 4 Replies Last reply Reply Quote 0
                • scottalanmillerS
                  scottalanmiller @DustinB3403
                  last edited by

                  @DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:

                  See I can agree with that @Carnival-Boy except that RTO in my mind should be the expected amount of time to recover from any of the possible scenarios out there.

                  That's never predictable. What if the network fails? What if the medium fails? What if the server is under load? What if things have changed?

                  It's not a totally useless number, but it is mostly useless.

                  1 Reply Last reply Reply Quote 0
                  • scottalanmillerS
                    scottalanmiller @DustinB3403
                    last edited by

                    @DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:

                    IE. Restoring an individual file shouldn't take more than a few minutes.

                    Even at a Fortune 10 bank restores were (one minute to two days.)

                    1 Reply Last reply Reply Quote 0
                    • dafyreD
                      dafyre @DustinB3403
                      last edited by

                      @DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:

                      See I can agree with that @Carnival-Boy except that RTO in my mind should be the expected amount of time to recover from any of the possible scenarios out there.

                      IE. Restoring an individual file shouldn't take more than a few minutes.

                      I'm just trying to put some real world time down for some of the realistic events that might occur, but even that seems difficult.

                      So go cause some of these realistic events (with your own data, of course) and see how long it takes to recover from them... and then double it... If it takes you 15 minutes to recover that word document you just accidentally deleted on purpose, then I would suggest recording that it could take 30 minutes or an hour to restore a file for someone.

                      scottalanmillerS 1 Reply Last reply Reply Quote 0
                      • DustinB3403D
                        DustinB3403
                        last edited by

                        So if RTO/RPO is mostly useless, how do you quantify a number at which is unacceptable for a business outage?

                        scottalanmillerS 1 Reply Last reply Reply Quote 0
                        • scottalanmillerS
                          scottalanmiller @DustinB3403
                          last edited by

                          @DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:

                          I'm just trying to put some real world time down for some of the realistic events that might occur, but even that seems difficult.

                          Because it IS difficult. Let's ask the same thing in some other terms...

                          How long "should" a file transfer from point A to point B take? If you ask the business they will tell you how long they want it to take. Ask IT and they will figure out how fast the wire can transfer it. Actually do it and find out that the bottlenecks were not where you thought that they were and the system is not pristine while doing so and that it takes an unpredictable amount of time because IT systems are complex, we can't accurately predict this stuff. We can guess, but the farther out, the less common the operation, the bigger the guess.

                          You can simulate some disasters and test some things. That's the best you can do, and it isn't very good.

                          1 Reply Last reply Reply Quote 1
                          • scottalanmillerS
                            scottalanmiller @dafyre
                            last edited by

                            @dafyre said in Recovery Time Objectives - How can I come up with a real world number...:

                            @DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:

                            See I can agree with that @Carnival-Boy except that RTO in my mind should be the expected amount of time to recover from any of the possible scenarios out there.

                            IE. Restoring an individual file shouldn't take more than a few minutes.

                            I'm just trying to put some real world time down for some of the realistic events that might occur, but even that seems difficult.

                            So go cause some of these realistic events (with your own data, of course) and see how long it takes to recover from them... and then double it... If it takes you 15 minutes to recover that word document you just accidentally deleted on purpose, then I would suggest recording that it could take 30 minutes or an hour to restore a file for someone.

                            And do it while you are home, your car is out of gas, you aren't dressed, your phone battery has died, the server is down, the tape is buried under paperwork, you don't have good labels and the person asking doesn't know the name of the file.

                            dafyreD 1 Reply Last reply Reply Quote 1
                            • dafyreD
                              dafyre @scottalanmiller
                              last edited by

                              @scottalanmiller said in Recovery Time Objectives - How can I come up with a real world number...:

                              @dafyre said in Recovery Time Objectives - How can I come up with a real world number...:

                              @DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:

                              See I can agree with that @Carnival-Boy except that RTO in my mind should be the expected amount of time to recover from any of the possible scenarios out there.

                              IE. Restoring an individual file shouldn't take more than a few minutes.

                              I'm just trying to put some real world time down for some of the realistic events that might occur, but even that seems difficult.

                              So go cause some of these realistic events (with your own data, of course) and see how long it takes to recover from them... and then double it... If it takes you 15 minutes to recover that word document you just accidentally deleted on purpose, then I would suggest recording that it could take 30 minutes or an hour to restore a file for someone.

                              And do it while you are home, your car is out of gas, you aren't dressed, your phone battery has died, the server is down, the tape is buried under paperwork, you don't have good labels and the person asking doesn't know the name of the file.

                              That's pretty darn realistic right there.

                              1 Reply Last reply Reply Quote 1
                              • scottalanmillerS
                                scottalanmiller @DustinB3403
                                last edited by

                                @DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:

                                So if RTO/RPO is mostly useless, how do you quantify a number at which is unacceptable for a business outage?

                                That's the fundamental flaw. There is no such number and cannot be. That's the danger of the RTO concept, that someone might actually think that such a number exists.

                                DustinB3403D 1 Reply Last reply Reply Quote 0
                                • DustinB3403D
                                  DustinB3403 @scottalanmiller
                                  last edited by

                                  @scottalanmiller said in Recovery Time Objectives - How can I come up with a real world number...:

                                  @DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:

                                  So if RTO/RPO is mostly useless, how do you quantify a number at which is unacceptable for a business outage?

                                  That's the fundamental flaw. There is no such number and cannot be. That's the danger of the RTO concept, that someone might actually think that such a number exists.

                                  Sorry my point is, how do you design a backup and recovery system if this is such a flawed goal? How do you define the recovery objective and systems to implement it?

                                  scottalanmillerS 1 Reply Last reply Reply Quote 0
                                  • scottalanmillerS
                                    scottalanmiller @DustinB3403
                                    last edited by

                                    @DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:

                                    Sorry my point is, how do you design a backup and recovery system if this is such a flawed goal? How do you define the recovery objective and systems to implement it?

                                    It's about curves. Think calculus. You have a cost curve that shows how much it costs you (losses) to be down over time (remember this is complex because we might be talking about a file or a VM or the entire infrastructure.) What does a file recover cost you? $20/day? Less, probably.

                                    Then you have a curve of what it cost to recover at different time intervals. This tends to be a jagged curve because of tech leaps. Like jumping from GigE to 10GigE jumps the price but REALLY improves performance.

                                    Then you compare the curves to see where the sweet spot is for the business based on the likeliness of the event.

                                    1 Reply Last reply Reply Quote 0
                                    • scottalanmillerS
                                      scottalanmiller
                                      last edited by

                                      And yes, backup and high availability discussions are actually real world cases where understanding calculus is practical for envisioning how these factors interact.

                                      1 Reply Last reply Reply Quote 3
                                      • 1
                                      • 2
                                      • 2 / 2
                                      • First post
                                        Last post