What % is normal availability?

scottalanmiller

@pete-s said in What % is normal availability?:

Given how reliable the equipment really is makes me wonder who really needs high availability. Most people aren't hosting NASDAQ servers exactly.

This is what I've been saying for years. A few things I keep pointing out to people...

HA is about shaving SECONDS, you have to pay for all of that HA in the tiniest amounts of time.
HA carries a lot of overhead and its own risks. Many HA systems induce more downtime than they protect against.
Many HA approaches or solutions only claim to approach uptimes lower than standard, rather than greater than!

bbigford

At about an hour a year, I would say four 9's is acceptable.

scottalanmiller

@bbigford said in What % is normal availability?:

At about an hour a year, I would say four 9's is acceptable.

Yeah, that's pretty good. Especially considering that is one hour at any given time. For a typical business that operates no more than 60 hours a week, the chances that that hour will fall during production hours is pretty low and extremely low if you want it to fall completely without business hours.

For a really typically office environment, four nines uptime from a typical single server starts pushing an average of something like 12 minutes of downtime a year average during production hours. It gets pretty crazy for a typical server in a typical company.

JaredBusch

Also, never forget this is unscheduled downtime.

Some people like to forget that.

Jimmy9008

We have had our four Dell R630s coming up to two years now. Excluding planned maintenance (for example patching for the Intel vulnerability), the servers have not been unavailable (unplanned) once in that time. Over 99.99%+ for us.

Monthly reboots are performed on the physical servers, but these are planned and VMs are migrated to between hosts for application availability during those windows.

What is the normal % availability for a website/app, rather than server?

scottalanmiller

@jimmy9008 said in What % is normal availability?:

We have had our four Dell R630s coming up to two years now. Excluding planned maintenance (for example patching for the Intel vulnerability), the servers have not been unavailable (unplanned) once in that time. Over 99.99%+ for us.

In the world of anecdotes...

We had two 1999 Compaq Proliant 800 servers that ran NT4 SP6a. Both made it a DECADE without unplanned downtime. 100% update, for a decade! They were finally retired because.. they were ridiculous by the time that they retired.

Jimmy9008

@scottalanmiller said in What % is normal availability?:

@jimmy9008 said in What % is normal availability?:

We have had our four Dell R630s coming up to two years now. Excluding planned maintenance (for example patching for the Intel vulnerability), the servers have not been unavailable (unplanned) once in that time. Over 99.99%+ for us.

In the world of anecdotes...

We had two 1999 Compaq Proliant 800 servers that ran NT4 SP6a. Both made it a DECADE without unplanned downtime. 100% update, for a decade! They were finally retired because.. they were ridiculous by the time that they retired.

:face_with_tears_of_joy: but it shows that hardware can be high uptime with well ran maintenance, no need for complexity. Our downtime comes from the developers/app builders rather than the hardware.

scottalanmiller

@jimmy9008 said in What % is normal availability?:

@scottalanmiller said in What % is normal availability?:

@jimmy9008 said in What % is normal availability?:

We have had our four Dell R630s coming up to two years now. Excluding planned maintenance (for example patching for the Intel vulnerability), the servers have not been unavailable (unplanned) once in that time. Over 99.99%+ for us.

In the world of anecdotes...

We had two 1999 Compaq Proliant 800 servers that ran NT4 SP6a. Both made it a DECADE without unplanned downtime. 100% update, for a decade! They were finally retired because.. they were ridiculous by the time that they retired.

:face_with_tears_of_joy: but it shows that hardware can be high uptime with well ran maintenance, no need for complexity. Our downtime comes from the developers/app builders rather than the hardware.

Oh gosh, yeah.

When in my massive environment (80K+ servers) our outages were caused by, in this order...

Developers and their code.
System admin mistakes.
SAN
Datacenter level failures
Hardware failures that don't support redundancy at this class (essentially memory failures.)

Things that never (during my decade tenure) caused outages...

Power Supplies (due to redundancy)
Fans (due to redundancy)
Hard Drives (due to redundancy)

1337

@scottalanmiller said in What % is normal availability?:

When in my massive environment (80K+ servers)

80k+ servers? That has to be something in size like Paypal or LinkedIn.

scottalanmiller

@pete-s said in What % is normal availability?:

@scottalanmiller said in What % is normal availability?:

When in my massive environment (80K+ servers)

80k+ servers? That has to be something in size like Paypal or LinkedIn.

Way bigger than those. Those would not come close.