ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Hot Swap vs. Blind Swap

    Announcements
    storage raid hot swap blind swap cold swap
    10
    66
    24.8k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • BRRABillB
      BRRABill
      last edited by

      I ask because I had an issue yesterday on our DELL server. Which, admittedly, is very, very old. Experienced, I should say. No one likes to be called old.

      It's our main data server. One of two servers that really matter.

      We have 4 drives in a RAID5 array. (This is from the dark ages when that was considered OK.)

      I went into the server room for something else, and noticed one of the drives was blinking amber. I go from a 1 to a 5 on the 1 to 10 anxiety scale because that kind of stuff always makes me nervous. Anyway, no problem, I have spare drives on the shelf ready to go. I pull out the old drive. No problem. I put in the new drive, no problem. I go to log in to start rebuilding the array, and I notice that the server is rebooting. Hmm, that's odd. I look at the drive. Now TWO of the four are blinking amber. I've now gone to a 10, LOL.

      Turns out a second drive failed after I did the hot plug. I'm not sure if it was just random (which seems unlikely) or something wierd happened during the hot plug.

      I spent a long, long time getting everything back to how it was.

      1 Reply Last reply Reply Quote 0
      • scottalanmillerS
        scottalanmiller
        last edited by

        RAID 5 induces other failures when you go to rebuild. It's extremely common and just an artifact of that RAID level. Doesn't mean that it will always do it or even normally do it, but it is very common. Once you do a drive swap it immediately increases the load on the drives and makes them more likely to fail.

        BRRABillB 1 Reply Last reply Reply Quote 1
        • BRRABillB
          BRRABill
          last edited by

          Interesting. The second failed drive definitely sounded like it was dead...mechanical issue.

          I think that happened to me a long time ago on a server, which is why I'm always nervous doing it.

          THOUGH thanks to ML I'll never have another RAID 5 array, so no need to worry!

          It doesn't do that for any other RAID level?

          And I am assuming RAID 5 of SSDs wouldn't do that?

          scottalanmillerS 1 Reply Last reply Reply Quote 0
          • scottalanmillerS
            scottalanmiller @BRRABill
            last edited by

            @BRRABill SSDs do suffer from mechanically induced failed like Winchester drives.

            BRRABillB 1 Reply Last reply Reply Quote 0
            • scottalanmillerS
              scottalanmiller
              last edited by

              RAID 6 induces even more immediate wear and tear so is even more likely to kill off a second drive at the time of drive replacement PLUS has one extra drive to have fail but can withstand losing one additional drive so is dramatically safer overall.

              1 Reply Last reply Reply Quote 0
              • BRRABillB
                BRRABill @scottalanmiller
                last edited by

                @scottalanmiller said:

                @BRRABill SSDs do suffer from mechanically induced failed like Winchester drives.

                Is the rate the same? Or is this a random (but common) thing?

                scottalanmillerS 1 Reply Last reply Reply Quote 0
                • scottalanmillerS
                  scottalanmiller @BRRABill
                  last edited by

                  @BRRABill said:

                  @scottalanmiller said:

                  @BRRABill SSDs do suffer from mechanically induced failed like Winchester drives.

                  Is the rate the same? Or is this a random (but common) thing?

                  Sorry that was a typo. SSDs do NOT suffer mechanically induced failure.

                  1 Reply Last reply Reply Quote 0
                  • BRRABillB
                    BRRABill
                    last edited by

                    Oh. Phew.

                    What is the point of RAID if that happens?

                    That's it. I'm quitting IT.

                    I've had enough.

                    scottalanmillerS 1 Reply Last reply Reply Quote 0
                    • scottalanmillerS
                      scottalanmiller @BRRABill
                      last edited by

                      @BRRABill not much point to RAID 5, that's what we've been saying for years. By 2009 it was so dangerous that it was actually worse in most cases than doing nothing at all.

                      1 Reply Last reply Reply Quote 0
                      • BRRABillB
                        BRRABill
                        last edited by

                        Well this server is from well before 2009.

                        It's a miracle nothing has happened yet.

                        drewlanderD 1 Reply Last reply Reply Quote 0
                        • scottalanmillerS
                          scottalanmiller
                          last edited by

                          That is indeed pretty old.

                          1 Reply Last reply Reply Quote 0
                          • BRRABillB
                            BRRABill
                            last edited by

                            I'm not going to say exactly HOW old because I've not sure I can take any more heads shaking at me this month. LOL.

                            1 Reply Last reply Reply Quote 1
                            • drewlanderD
                              drewlander
                              last edited by

                              @BRRABill said:

                              0 anxiety scale because that kind of stuff always makes me nervous. Anyway, no problem, I have spare drives on the shelf ready to go. I pull out the

                              In complete honesty I will admit that one time I was cold swapping a failed drive in a proliant dl360G5 and replaced the wrong one. Fortunately the server wouldnt even boot and I was able to power it down, sort it out and bring it back up. Since then I will never run a server without the backplane kit and hot swappable drive caddies with the status indicator LED.

                              1 Reply Last reply Reply Quote 2
                              • drewlanderD
                                drewlander @BRRABill
                                last edited by

                                @BRRABill Sounds like a situation I had to deal with last year where an organization was running Dell PowerEdge 2950 Gen II pizza boxes. I tried reasoning with them explaining that 9 year old servers should not be production machines for mission critical systems. They didn't seem to care about business continuity until they started failing.

                                BRRABillB 1 Reply Last reply Reply Quote 3
                                • BRRABillB
                                  BRRABill @drewlander
                                  last edited by

                                  @drewlander said:

                                  @BRRABill Sounds like a situation I had to deal with last year where an organization was running Dell PowerEdge 2950 Gen II pizza boxes. I tried reasoning with them explaining that 9 year old servers should not be production machines for mission critical systems. They didn't seem to care about business continuity until they started failing.

                                  This was a PowerEdge 2800. I've been kind of proud of the fact that I kept these things up and running for so long. And considering the low RAM and age, they still run awesome.

                                  BUT ... like I said it's a miracle that things haven't gone south quicker. The second drive that failed was a replacement drive, which of course was not new.

                                  Key point, as in anything, is to always have a good backup. 🙂

                                  1 Reply Last reply Reply Quote 1
                                  • scottalanmillerS
                                    scottalanmiller
                                    last edited by

                                    We once had a set of Compaq Proliant 800s that made it a decade without failing. They were all retired effectively still healthy - just old and worthless.

                                    BRRABillB 1 Reply Last reply Reply Quote 0
                                    • BRRABillB
                                      BRRABill @scottalanmiller
                                      last edited by

                                      @scottalanmiller said:

                                      We once had a set of Compaq Proliant 800s that made it a decade without failing. They were all retired effectively still healthy - just old and worthless.

                                      That's about where we are. I've hung lucky mementos in there, and am hoping for the best. 🙂

                                      I actually have a construction paper good luck charm a vendor's wife once gave me a long time ago (before these servers even) that's actually hanging in there. It has done it's job pretty good so far.

                                      1 Reply Last reply Reply Quote 0
                                      • BRRABillB
                                        BRRABill
                                        last edited by

                                        True story. Right after I posted that last post, I went into the server room to take a picture of this paper good luck charm. On the way back down the hall, the building's power went out, and has been out the past 3 hours. This week is just AWESOME!

                                        Anyway, here is the picture:
                                        0_1447356517797_goodluckcharm.JPG

                                        Note the failed DELL right below it.

                                        It did its job for many years, though. No complaints.

                                        J 1 Reply Last reply Reply Quote 0
                                        • BRRABillB
                                          BRRABill
                                          last edited by

                                          P.S. If anyone can read that, and it DOESN'T say good luck, please don't let me know. 🙂

                                          J drewlanderD 2 Replies Last reply Reply Quote 1
                                          • Reid CooperR
                                            Reid Cooper
                                            last edited by

                                            What the heck is that thing?

                                            BRRABillB 1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 4
                                            • 1 / 4
                                            • First post
                                              Last post