Recovery Time Objectives - How can I come up with a real world number...
-
In the real world companies lose money by the hour. No viable company can't be down for hours or days, most can be down for weeks or months. Not that it wouldn't hurt, but they can be and still survive. The "we can't be down for more than X" idea makes no sense because it basically says "don't bother recovering faster than this because we aren't saying that there is any value" and then "don't bother trying to recover if you can't make this line because we will be out of business." No business loses nothing for a day, then suddenly goes out of business taking all of their losses in one second.
-
I guess I mis-worded my original statement. Or didn't write it appropriately. I assumed that the cost of downtime vs the cost of a solution would be taken into account when defining the RTO. Although you've cleared it up significantly.
-
To answer the original question: How can I come up with a real world number...
You can't. Business systems are too complex to come up with a single figure. And disasters are always too unpredictable. The exercise is a bullshit marketing job to convince someone to spend some money.
IMHO
-
@Carnival-Boy said in Recovery Time Objectives - How can I come up with a real world number...:
To answer the original question: How can I come up with a real world number...
You can't. Business systems are too complex to come up with a single figure. And disasters are always too unpredictable. The exercise is a bullshit marketing job to convince someone to spend some money.
IMHO
I agree, it's not something that I think IT should be doing at all. You get numbers, you make a reasonable investment. You might have some guess as to recovery times which are useful for triage (like does it take one hour or ten hours to get systems back off of the tape) but RTO/RPO are just silly. In all my years I've never had an occasion to use them.
-
See I can agree with that @Carnival-Boy except that RTO in my mind should be the expected amount of time to recover from any of the possible scenarios out there.
IE. Restoring an individual file shouldn't take more than a few minutes.
I'm just trying to put some real world time down for some of the realistic events that might occur, but even that seems difficult.
-
@DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:
See I can agree with that @Carnival-Boy except that RTO in my mind should be the expected amount of time to recover from any of the possible scenarios out there.
That's never predictable. What if the network fails? What if the medium fails? What if the server is under load? What if things have changed?
It's not a totally useless number, but it is mostly useless.
-
@DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:
IE. Restoring an individual file shouldn't take more than a few minutes.
Even at a Fortune 10 bank restores were (one minute to two days.)
-
@DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:
See I can agree with that @Carnival-Boy except that RTO in my mind should be the expected amount of time to recover from any of the possible scenarios out there.
IE. Restoring an individual file shouldn't take more than a few minutes.
I'm just trying to put some real world time down for some of the realistic events that might occur, but even that seems difficult.
So go cause some of these realistic events (with your own data, of course) and see how long it takes to recover from them... and then double it... If it takes you 15 minutes to recover that word document you just accidentally deleted on purpose, then I would suggest recording that it could take 30 minutes or an hour to restore a file for someone.
-
So if RTO/RPO is mostly useless, how do you quantify a number at which is unacceptable for a business outage?
-
@DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:
I'm just trying to put some real world time down for some of the realistic events that might occur, but even that seems difficult.
Because it IS difficult. Let's ask the same thing in some other terms...
How long "should" a file transfer from point A to point B take? If you ask the business they will tell you how long they want it to take. Ask IT and they will figure out how fast the wire can transfer it. Actually do it and find out that the bottlenecks were not where you thought that they were and the system is not pristine while doing so and that it takes an unpredictable amount of time because IT systems are complex, we can't accurately predict this stuff. We can guess, but the farther out, the less common the operation, the bigger the guess.
You can simulate some disasters and test some things. That's the best you can do, and it isn't very good.
-
@dafyre said in Recovery Time Objectives - How can I come up with a real world number...:
@DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:
See I can agree with that @Carnival-Boy except that RTO in my mind should be the expected amount of time to recover from any of the possible scenarios out there.
IE. Restoring an individual file shouldn't take more than a few minutes.
I'm just trying to put some real world time down for some of the realistic events that might occur, but even that seems difficult.
So go cause some of these realistic events (with your own data, of course) and see how long it takes to recover from them... and then double it... If it takes you 15 minutes to recover that word document you just accidentally deleted on purpose, then I would suggest recording that it could take 30 minutes or an hour to restore a file for someone.
And do it while you are home, your car is out of gas, you aren't dressed, your phone battery has died, the server is down, the tape is buried under paperwork, you don't have good labels and the person asking doesn't know the name of the file.
-
@scottalanmiller said in Recovery Time Objectives - How can I come up with a real world number...:
@dafyre said in Recovery Time Objectives - How can I come up with a real world number...:
@DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:
See I can agree with that @Carnival-Boy except that RTO in my mind should be the expected amount of time to recover from any of the possible scenarios out there.
IE. Restoring an individual file shouldn't take more than a few minutes.
I'm just trying to put some real world time down for some of the realistic events that might occur, but even that seems difficult.
So go cause some of these realistic events (with your own data, of course) and see how long it takes to recover from them... and then double it... If it takes you 15 minutes to recover that word document you just accidentally deleted on purpose, then I would suggest recording that it could take 30 minutes or an hour to restore a file for someone.
And do it while you are home, your car is out of gas, you aren't dressed, your phone battery has died, the server is down, the tape is buried under paperwork, you don't have good labels and the person asking doesn't know the name of the file.
That's pretty darn realistic right there.
-
@DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:
So if RTO/RPO is mostly useless, how do you quantify a number at which is unacceptable for a business outage?
That's the fundamental flaw. There is no such number and cannot be. That's the danger of the RTO concept, that someone might actually think that such a number exists.
-
@scottalanmiller said in Recovery Time Objectives - How can I come up with a real world number...:
@DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:
So if RTO/RPO is mostly useless, how do you quantify a number at which is unacceptable for a business outage?
That's the fundamental flaw. There is no such number and cannot be. That's the danger of the RTO concept, that someone might actually think that such a number exists.
Sorry my point is, how do you design a backup and recovery system if this is such a flawed goal? How do you define the recovery objective and systems to implement it?
-
@DustinB3403 said in Recovery Time Objectives - How can I come up with a real world number...:
Sorry my point is, how do you design a backup and recovery system if this is such a flawed goal? How do you define the recovery objective and systems to implement it?
It's about curves. Think calculus. You have a cost curve that shows how much it costs you (losses) to be down over time (remember this is complex because we might be talking about a file or a VM or the entire infrastructure.) What does a file recover cost you? $20/day? Less, probably.
Then you have a curve of what it cost to recover at different time intervals. This tends to be a jagged curve because of tech leaps. Like jumping from GigE to 10GigE jumps the price but REALLY improves performance.
Then you compare the curves to see where the sweet spot is for the business based on the likeliness of the event.
-
And yes, backup and high availability discussions are actually real world cases where understanding calculus is practical for envisioning how these factors interact.