RAID10 - Two Drive Failure
-
@travisdh1 said in RAID10 - Two Drive Failure:
@gjacobse said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
Predictive failure is not failure. Replace one at a time. to give the RAID card the most power to work on the individual resilver.
in my experience - you
never
replace more than one drive at a time...Ask me how I know.
I solemnly swear that I've pulled the wrong drive to replace before . Made a RAID6 rebuild take a lot longer, and a RAID 10 freak out till a reboot happened. Restoring from backup was always an option at least.
That's the biggest risk... pulling the wrong drive or not being certain which arrays failed drives are in.
-
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@aaronstuder said in RAID10 - Two Drive Failure:
Drive are 1 and 3 are in "predictive failure" , I am assuming the pairs are 0+1 and 2+3.
Why?
Why what? Assuming? Because he did not document and most hardware RAID controllers are not accessible except during the boot process.
Why does he feel that they are in the sets that they are? I hear people say that all the time, and it turns out that most people say that without basing it on anything at all.
-
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@gjacobse said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
Predictive failure is not failure. Replace one at a time. to give the RAID card the most power to work on the individual resilver.
in my experience - you
never
replace more than one drive at a time...Ask me how I know.
In RAID 10, you always do if they are in different RAID 1 sets, always.
I completely disagree. Reason stated above.This is a predictive failure, not a failure. You will get a faster resilver of each mirror by doing them individually.
Of course I am assuming that the unit is in use and busy with normal system read/writes.
It should not be faster, not at all. A RAID 1 resilver should be able to do, even on the crappiest controllers, many RAID 1 rebuilds at wire speed all at once. The RAID card would not be a bottleneck, even with lots of drives going at the same time because there is effectively zero CPU overhead, it just passes straight from one drive to the other. So for example the reason you state, faster rebuilds, you do them all at once because you can rebuild two or more in the exact same time that you could do just one. There is no parity calculations, so the CPU on the RAID card remains effectively idle.
-
@wirestyle22 said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
I completely disagree. Reason stated above.This is a predictive failure, not a failure. You will get a faster resilver of each mirror by doing them individually.
Doesn't this put twice the amount of mileage on the array though? or no
No, I'm not sure what you mean exactly, but the answer is definitely no.
-
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@DustinB3403 said in RAID10 - Two Drive Failure:
@aaronstuder What raid controller do you have?
Exactly this. A real SMB system should be a hot plug. But we have no idea what you bought.
Even hot plug often requires you to offline a disk before you take it out for the system to be clear as to what is happening. ZFS is hot swap, but still requires that, for example.
I said it already, but hardware controllers in most systems have no interface to the OS in order to do anything. There is no way to offline a drive while running.
-
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@aaronstuder said in RAID10 - Two Drive Failure:
Some information I am reading says I need to take the drive "offline" first, is this true?
PowerEdge R720.
If killing one in predictive failure, yes you generally offline it first. But this depends on the controller in question, the server doesn't matter.
Obviously, depends on the controller, but the point of blind swap being standard choice for the SMB is that you simply swap the drives.
Blind swap yes, hot swap no. Even some blind swap struggle with healthy drives being pulled, though.
-
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@DustinB3403 said in RAID10 - Two Drive Failure:
@aaronstuder What raid controller do you have?
Exactly this. A real SMB system should be a hot plug. But we have no idea what you bought.
Even hot plug often requires you to offline a disk before you take it out for the system to be clear as to what is happening. ZFS is hot swap, but still requires that, for example.
I said it already, but hardware controllers in most systems have no interface to the OS in order to do anything. There is no way to offline a drive while running.
Are blind swap and hot swap different? I could have sworn blind swap was the one where you don't offline a drive prior to replacing it.
-
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@DustinB3403 said in RAID10 - Two Drive Failure:
@aaronstuder What raid controller do you have?
Exactly this. A real SMB system should be a hot plug. But we have no idea what you bought.
Even hot plug often requires you to offline a disk before you take it out for the system to be clear as to what is happening. ZFS is hot swap, but still requires that, for example.
I said it already, but hardware controllers in most systems have no interface to the OS in order to do anything. There is no way to offline a drive while running.
I'm not aware of any that don't have an OS interface available. Lots of people don't install one, but most have one.
-
Blind Swapping: Generally a unique feature to hardware RAID systems. This is an extension of hot swapping that includes not needing to interact with the operating system first. Hot swapping alone does not imply that a lack of interaction is needed. Blind swapping is popular in large datacenters so that datacenter staff who do not have access to the operating system can replace failed drives without any interaction from the systems administrators.
as per @scottalanmiller
-
@coliver said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@DustinB3403 said in RAID10 - Two Drive Failure:
@aaronstuder What raid controller do you have?
Exactly this. A real SMB system should be a hot plug. But we have no idea what you bought.
Even hot plug often requires you to offline a disk before you take it out for the system to be clear as to what is happening. ZFS is hot swap, but still requires that, for example.
I said it already, but hardware controllers in most systems have no interface to the OS in order to do anything. There is no way to offline a drive while running.
Are blind swap and hot swap different? I could have sworn blind swap was the one where you don't offline a drive prior to replacing it.
Very.
-
That said, if you have properly set up the tools like Dell OMSA, then you can do it there.
-
@wirestyle22 said in RAID10 - Two Drive Failure:
Blind Swapping: Generally a unique feature to hardware RAID systems. This is an extension of hot swapping that includes not needing to interact with the operating system first. Hot swapping alone does not imply that a lack of interaction is needed. Blind swapping is popular in large datacenters so that datacenter staff who do not have access to the operating system can replace failed drives without any interaction from the systems administrators.
as per @scottalanmiller
That's right.
-
Isn't there a SMART tool that will scan the drives on the RAID controller and flash the LEDs on ones with errors?
-
@dafyre said in RAID10 - Two Drive Failure:
Isn't there a SMART tool that will scan the drives on the RAID controller and flash the LEDs on ones with errors?
No, nothing can see through a RAID controller. If you want to do a SMART scan, the RAID controller itself must do it.
-
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@DustinB3403 said in RAID10 - Two Drive Failure:
@aaronstuder What raid controller do you have?
Exactly this. A real SMB system should be a hot plug. But we have no idea what you bought.
Even hot plug often requires you to offline a disk before you take it out for the system to be clear as to what is happening. ZFS is hot swap, but still requires that, for example.
I said it already, but hardware controllers in most systems have no interface to the OS in order to do anything. There is no way to offline a drive while running.
I'm not aware of any that don't have an OS interface available. Lots of people don't install one, but most have one.
That is installing 3rd party software, not an OS interface.
-
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@aaronstuder said in RAID10 - Two Drive Failure:
Drive are 1 and 3 are in "predictive failure" , I am assuming the pairs are 0+1 and 2+3.
Why?
Why what? Assuming? Because he did not document and most hardware RAID controllers are not accessible except during the boot process.
Why does he feel that they are in the sets that they are? I hear people say that all the time, and it turns out that most people say that without basing it on anything at all.
Because by default, when using the PERC controllers from dell, it makes the array that way. Yes, it is horrible to assume that, but it is how the PERC controller does things by default.
@aaronstuder you see the spans in OMSA here.
-
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@DustinB3403 said in RAID10 - Two Drive Failure:
@aaronstuder What raid controller do you have?
Exactly this. A real SMB system should be a hot plug. But we have no idea what you bought.
Even hot plug often requires you to offline a disk before you take it out for the system to be clear as to what is happening. ZFS is hot swap, but still requires that, for example.
I said it already, but hardware controllers in most systems have no interface to the OS in order to do anything. There is no way to offline a drive while running.
I'm not aware of any that don't have an OS interface available. Lots of people don't install one, but most have one.
That is installing 3rd party software, not an OS interface.
IF by third party you mean the tools from the vendor, yes. It's basically a driver extension.
-
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@DustinB3403 said in RAID10 - Two Drive Failure:
@aaronstuder What raid controller do you have?
Exactly this. A real SMB system should be a hot plug. But we have no idea what you bought.
Even hot plug often requires you to offline a disk before you take it out for the system to be clear as to what is happening. ZFS is hot swap, but still requires that, for example.
I said it already, but hardware controllers in most systems have no interface to the OS in order to do anything. There is no way to offline a drive while running.
I'm not aware of any that don't have an OS interface available. Lots of people don't install one, but most have one.
That is installing 3rd party software, not an OS interface.
IF by third party you mean the tools from the vendor, yes. It's basically a driver extension.
It is 3rd party to the OS. I clearly stated, many times, that there are no OS tools for this.
-
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@scottalanmiller said in RAID10 - Two Drive Failure:
@JaredBusch said in RAID10 - Two Drive Failure:
@DustinB3403 said in RAID10 - Two Drive Failure:
@aaronstuder What raid controller do you have?
Exactly this. A real SMB system should be a hot plug. But we have no idea what you bought.
Even hot plug often requires you to offline a disk before you take it out for the system to be clear as to what is happening. ZFS is hot swap, but still requires that, for example.
I said it already, but hardware controllers in most systems have no interface to the OS in order to do anything. There is no way to offline a drive while running.
I'm not aware of any that don't have an OS interface available. Lots of people don't install one, but most have one.
That is installing 3rd party software, not an OS interface.
IF by third party you mean the tools from the vendor, yes. It's basically a driver extension.
It is 3rd party to the OS. I clearly stated, many times, that there are no OS tools for this.
You generally get the RAID drivers in the same way, it's third party hardware to the OS. In many cases, though, I think that there are built in tools. Doesn't the Smart Array controllers get management via OS updates on Linux?
-
How about answering his question.
If you do not know what the pairs are, don't assume. Replace 1, rinse and repeat. General rule of thumb that is, error on side of caution.
Predictive Failure is just a SMART report saying something doesn't add up on sectors. Order and replace ASAP.