ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Halloween Outage 2015

    Announcements
    mangolassi
    4
    9
    1.5k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scottalanmillerS
      scottalanmiller
      last edited by

      We just had a major outage this morning and are still investigating. We are unsure what happened but shortly before 7am the entire server become inaccessible. For a short time the Rackspace system was also inaccessible and once it was available again, console access to our server was not available.

      We rebooted at 9:56 AM EDT and the system came up but MangoLassi did not. At 9:59 AM EDT the system rebooted on its own, perhaps from a power cycle on the RS side as they have been known to do without authorization, and at that time the site came up automatically.

      1 Reply Last reply Reply Quote 3
      • scottalanmillerS
        scottalanmiller
        last edited by

        Here is some quick SAR info for the record:

        12:55:01 PM     all      1.98      0.00      0.29      0.04      0.04     97.64
        12:56:01 PM     all      4.92      0.00      0.41      0.05      0.04     94.57
        12:57:01 PM     all      1.12      0.00      0.18      0.03      0.04     98.62
        12:58:01 PM     all      1.57      0.00      0.19      0.02      0.04     98.18
        12:59:01 PM     all      3.96      0.00      0.31      0.03      0.04     95.67
        Average:        all      5.36      0.00      0.46      0.08      0.05     94.05
        
        01:56:46 PM       LINUX RESTART (2 CPU)
        
        01:57:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
        01:58:01 PM     all     21.08      0.00      0.90      0.60      0.08     77.33
        Average:        all     21.08      0.00      0.90      0.60      0.08     77.33
        
        01:59:47 PM       LINUX RESTART (2 CPU)
        
        02:00:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
        02:01:01 PM     all     31.24      0.00      1.26      1.55      0.08     65.87
        02:02:01 PM     all      3.73      0.00      0.53      0.33      0.06     95.36
        02:03:01 PM     all      7.35      0.00      0.46      0.23      0.05     91.91
        02:04:01 PM     all      3.06      0.00      0.33      0.09      0.05     96.47
        02:05:01 PM     all      4.07      0.00      0.44      0.18      0.04     95.26
        
        1 Reply Last reply Reply Quote 0
        • scottalanmillerS
          scottalanmiller
          last edited by

          The restart at 1:56 PM UTC was caused by me doing a power cycle on the RS console. The one at 1:59 PM UTC was not authorized and we do not know yet the cause.

          1 Reply Last reply Reply Quote 0
          • scottalanmillerS
            scottalanmiller
            last edited by

            outagegraph.png

            Here is what the outage looks like in views 😞 Sadly on a super busy day when we are pushing for a new site record.

            1 Reply Last reply Reply Quote 0
            • scottalanmillerS
              scottalanmiller
              last edited by

              Ah ha, we have confirmation, it was Rackspace. The underlying hardware failed and they had to move the workload. Nothing on our end, thank goodness. Hardware failure happens so this was relatively minor. Now the question becomes - are we so busy that we should be talking about high availability options for the site? Our outages are small, but there is a real possibility that with the continued growth that the risks to outages will get bigger and bigger. It could easily be time to consider a database cluster, multiple application servers and a load balancer!

              1 Reply Last reply Reply Quote 1
              • scottalanmillerS
                scottalanmiller
                last edited by

                Here is the RS confirmation:

                rsissue.png

                1 Reply Last reply Reply Quote 0
                • H
                  hubtechagain
                  last edited by

                  OMG Rackspace must be out of business

                  J 1 Reply Last reply Reply Quote 3
                  • Deleted74295D
                    Deleted74295 Banned
                    last edited by

                    I think it was a ghost.

                    1 Reply Last reply Reply Quote 0
                    • J
                      Jason Banned @hubtechagain
                      last edited by Jason

                      @hubtechagain said:

                      OMG Rackspace must be out of business

                      Definitely their twitter hasn't had a post since Tuesday, and latest reply was 6 min ago. What kind of scam is this?

                      1 Reply Last reply Reply Quote 1
                      • 1 / 1
                      • First post
                        Last post