Hot Swap vs. Blind Swap
-
@BRRABill said:
0 anxiety scale because that kind of stuff always makes me nervous. Anyway, no problem, I have spare drives on the shelf ready to go. I pull out the
In complete honesty I will admit that one time I was cold swapping a failed drive in a proliant dl360G5 and replaced the wrong one. Fortunately the server wouldnt even boot and I was able to power it down, sort it out and bring it back up. Since then I will never run a server without the backplane kit and hot swappable drive caddies with the status indicator LED.
-
@BRRABill Sounds like a situation I had to deal with last year where an organization was running Dell PowerEdge 2950 Gen II pizza boxes. I tried reasoning with them explaining that 9 year old servers should not be production machines for mission critical systems. They didn't seem to care about business continuity until they started failing.
-
@drewlander said:
@BRRABill Sounds like a situation I had to deal with last year where an organization was running Dell PowerEdge 2950 Gen II pizza boxes. I tried reasoning with them explaining that 9 year old servers should not be production machines for mission critical systems. They didn't seem to care about business continuity until they started failing.
This was a PowerEdge 2800. I've been kind of proud of the fact that I kept these things up and running for so long. And considering the low RAM and age, they still run awesome.
BUT ... like I said it's a miracle that things haven't gone south quicker. The second drive that failed was a replacement drive, which of course was not new.
Key point, as in anything, is to always have a good backup.
-
We once had a set of Compaq Proliant 800s that made it a decade without failing. They were all retired effectively still healthy - just old and worthless.
-
@scottalanmiller said:
We once had a set of Compaq Proliant 800s that made it a decade without failing. They were all retired effectively still healthy - just old and worthless.
That's about where we are. I've hung lucky mementos in there, and am hoping for the best.
I actually have a construction paper good luck charm a vendor's wife once gave me a long time ago (before these servers even) that's actually hanging in there. It has done it's job pretty good so far.
-
True story. Right after I posted that last post, I went into the server room to take a picture of this paper good luck charm. On the way back down the hall, the building's power went out, and has been out the past 3 hours. This week is just AWESOME!
Anyway, here is the picture:
Note the failed DELL right below it.
It did its job for many years, though. No complaints.
-
P.S. If anyone can read that, and it DOESN'T say good luck, please don't let me know.
-
What the heck is that thing?
-
@Reid-Cooper said:
What the heck is that thing?
Which thing?
The paper thing?
Way back in the day when I used to assemble computers, the wife of the guy whose shop I went to made that for me and said it was a good luck charm. I hung it in our server room, and it's been with the servers ever since.
-
-
@BRRABill said:
P.S. If anyone can read that, and it DOESN'T say good luck, please don't let me know.
@JaredBusch might know.
-
-
@scottalanmiller said:
RAID 5 induces other failures when you go to rebuild. It's extremely common and just an artifact of that RAID level. Doesn't mean that it will always do it or even normally do it, but it is very common. Once you do a drive swap it immediately increases the load on the drives and makes them more likely to fail.
Is it just RAID 5 that induces failures? I mean, theoretically couldn't a RAID 10 array do the same thing?
-
@BRRABill said:
Is it just RAID 5 that induces failures? I mean, theoretically couldn't a RAID 10 array do the same thing?
Parity RAID induces it on resilver, mirrored RAID really does not. It does a little, but only a little, and only to a single drive not all drives. So the impact of parity rebuilds is always at least double that of any mirrored RAID and often many, many times more.
-
My drive failed almost immediately. I mean, whatever happened rebooted the server.
-
@BRRABill said:
My drive failed almost immediately. I mean, whatever happened rebooted the server.
With RAID 5 that can be almost anything. Secondary drive failed naturally, resilver induced, URE, etc. RAID 5 has abundant failure modes that could have happened there.
-
@BRRABill It's possible that the drive had a loose connection and replacing the other knocked it offline.
-
That too, could be as simple as physical vibration.
-
@BRRABill That Chinese character means "Spring".
-
It was firmly plugged in. I think it just gave up the ghost.
I've seen that kind of stuff happen with a surge, but that seems unlikely in a hotplug backplane.