ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    A Public Post Mortem of An Outage

    IT Discussion
    post mortem risk risk analysis planning triage
    4
    11
    3.2k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DashrenderD
      Dashrender
      last edited by

      $10K isn't exactly chump change, but it's not really a lot either. 3 day outage and it didn't equate to a 10K in losses? What was on that server?

      JaredBuschJ scottalanmillerS 2 Replies Last reply Reply Quote 0
      • JaredBuschJ
        JaredBusch @Dashrender
        last edited by

        @Dashrender said in A Public Post Mortem of An Outage:

        $10K isn't exactly chump change, but it's not really a lot either. 3 day outage and it didn't equate to a 10K in losses? What was on that server?

        System down time does not directly relate to a complete loss of revenue as most business try to claim. It more often is related to a slower revenue stream, which significantly expands the time that things can be down.

        scottalanmillerS 1 Reply Last reply Reply Quote 2
        • DashrenderD
          Dashrender
          last edited by

          Of course I understand that. Each business is different.

          When we self hosted our EHR, if it was down for a day, we could literally cancel clinics until it was fixed. While many of those patients would be rescheduled, we'd be paying staff to be onsite cleaning up, etc, and those costs add up against the no income stream we would be having.

          scottalanmillerS 1 Reply Last reply Reply Quote 0
          • scottalanmillerS
            scottalanmiller @Dashrender
            last edited by

            @Dashrender said in A Public Post Mortem of An Outage:

            $10K isn't exactly chump change, but it's not really a lot either. 3 day outage and it didn't equate to a 10K in losses? What was on that server?

            It was more than $10K is losses, $10K is how much cheaper it was to take the loss rather than to pay to mitigate it.

            Most SMBs can have their servers down for a bit without major impact. AD, for example, will have near zero impact on a normal business because of cached creds.

            1 Reply Last reply Reply Quote 0
            • scottalanmillerS
              scottalanmiller @JaredBusch
              last edited by

              @JaredBusch said in A Public Post Mortem of An Outage:

              @Dashrender said in A Public Post Mortem of An Outage:

              $10K isn't exactly chump change, but it's not really a lot either. 3 day outage and it didn't equate to a 10K in losses? What was on that server?

              System down time does not directly relate to a complete loss of revenue as most business try to claim. It more often is related to a slower revenue stream, which significantly expands the time that things can be down.

              Exactly. They didn't have email on there, nor phones, so their communications didn't go down. And AD was cached, so not impacted except for users migrating from one desktop to another which they don't do (or very rarely) and all of their applications work offline. They were certainly impacted, but it definitely didn't bring the business to its knees, either.

              1 Reply Last reply Reply Quote 0
              • scottalanmillerS
                scottalanmiller @Dashrender
                last edited by

                @Dashrender said in A Public Post Mortem of An Outage:

                When we self hosted our EHR, if it was down for a day, we could literally cancel clinics until it was fixed. While many of those patients would be rescheduled, we'd be paying staff to be onsite cleaning up, etc, and those costs add up against the no income stream we would be having.

                A major factor for a lot of businesses is the rubber band effect of work - only companies running with their production "backs against the wall" can't experience it. What happens is that the staff gets time to "rest" while nothing is happening. They might take time off, just have a "lazy day" or catch up on the other things... cleaning the office, rearranging the furniture, physical filing, whatever. The chances that it would have zero value are very low, almost impossible. Then, when the systems return, they are better prepared to work more intensely and can often catch up either partially or fully. Rarely do you have a total productivity loss and rarely a total recovery, but normally somewhere in between as you tend to work faster and take on work more productively.

                Since a normal business isn't doing as much work as it could possibly do (only those that don't do sales and have no ability to take on new customers without more resources) they can normally catch up to some degree. This doesn't work for businesses like a 911 call center, of course. But a typical business can, at least partially.

                1 Reply Last reply Reply Quote 0
                • DustinB3403D
                  DustinB3403
                  last edited by

                  Good topic.

                  One question, how long was the outage?

                  scottalanmillerS 1 Reply Last reply Reply Quote 0
                  • scottalanmillerS
                    scottalanmiller @DustinB3403
                    last edited by

                    @DustinB3403 said in A Public Post Mortem of An Outage:

                    One question, how long was the outage?

                    Nearly a week.

                    DustinB3403D 1 Reply Last reply Reply Quote 0
                    • DustinB3403D
                      DustinB3403 @scottalanmiller
                      last edited by

                      @scottalanmiller said in A Public Post Mortem of An Outage:

                      @DustinB3403 said in A Public Post Mortem of An Outage:

                      One question, how long was the outage?

                      Nearly a week.

                      Wow, that is a rather long time.

                      scottalanmillerS 1 Reply Last reply Reply Quote 0
                      • scottalanmillerS
                        scottalanmiller @DustinB3403
                        last edited by

                        @DustinB3403 said in A Public Post Mortem of An Outage:

                        Wow, that is a rather long time.

                        Yup, parts were very hard to get and getting the server physically moved before diagnostics could begin ate huge amounts of time up. Cost of speeding things up would have been huge - replacing gear instead of repairing it. But since the vendor could not diagnose the issue with the hardware (their error messages were ones that they did not have documented) it complicated things greatly.

                        1 Reply Last reply Reply Quote 0
                        • 1 / 1
                        • First post
                          Last post