Replacing a Failed drive in MD RAID 10


  • Banned

    So tomorrow's project (as I'm building backups and heading home for the night) will be how to determine which drive is failed with MDADM as well as physically tell and then how to eject the disk from the Software array to be replaced.


  • Banned

    So to start let's check the array.

    0_1453936142530_chrome_2016-01-27_18-08-43.png

    Obviously sdc is in a Failed state.

    So let's see what smartclt has to say...

      smartctl -i /dev/sdc
    

    Hrm... something is off....
    0_1453936287111_chrome_2016-01-27_18-10-53.png

    So It would appear I have to update the smartctl database...


  • Banned

    Now with leaving SmartCTL as is(I'll have to come back to it); I don't have hot-swap capabilities on this server. An updated version of SmartCTL would be nice to provide additional information about my disks, and is something that I want to update. But the critical point is to get this drive swapped out as quickly as possible so that I can get this server back to good running condition.

    Since I don't have hot-swap capabilities, I'm going to have to shut down the server in order to actually perform the disk exchange. Not overly complex, but adds to the risk of having to restore from backup should something go horribly wrong.


  • Banned

    Now there are a few guides that keep popping up in Google Search that give instructions on how to do this for RAID 1 MDADM Arrays.

    And even @scottalanmiller has recommended the same above guide for RAID10 and this one on SW. But again RAID1.

    So we'll have to work through it and ensure that they are still accurate.



  • @DustinB3403 Should be, mdadm still works the same way.


  • Banned

    @travisdh1 said:

    @DustinB3403 Should be, mdadm still works the same way.

    Thanks, just being extra cautious to ensure this works smoothly.

    To remove the disk from the array I should have to simply type

    mdadm --manage /dev/md0 --fail /dev/sdc
    

    and then

    mdadm --manage /dev/md0 --remove /dev/sdc
    

    At this point I should be able to shutdown the server, remove the disk and add it's replacement with

     shutdown -h now

  • Banned

    Obviously at this point there is some manual labor involved since I have no hot-swap capabilities. If your server has hot-swap you can just pull the drive at this point and add the replacement disk.


  • Banned

    I'm at a stand-still as I wait for my replacement disk to arrive, so this project will have to get picked up in a day or so.



  • @DustinB3403 said:

    @travisdh1 said:

    @DustinB3403 Should be, mdadm still works the same way.

    Thanks, just being extra cautious to ensure this works smoothly.

    To remove the disk from the array I should have to simply type

    mdadm --manage /dev/md0 --fail /dev/sdc
    

    and then

    mdadm --manage /dev/md0 --remove /dev/sdc
    

    At this point I should be able to shutdown the server, remove the disk and add it's replacement with

     shutdown -h now
    

    Yep. After putting a replacement drive in, just add it back.

    mdadm --manage /dev/md0 --add /dev/sd?
    

    I like to keep an eye on the rebuild process with:

    watch /cat/proc/mdstat
    

    The array should be back to normal.



  • How did you figure out what drive it was in the array? Or did you pull them until you saw the one with that serial number?


  • Banned

    @coliver said:

    How did you figure out what drive it was in the array? Or did you pull them until you saw the one with that serial number?

    How do I know which disk it is?

    Well the other day I noticed that the array had a failed disk. Since I was rebuilding the system anyways I pulled each disk and performed a check disk from windows while checking for bad sectors.

    Only 1 disk was found with bad sectors.

    Knowing which disk this was, and windows saying it fixed the problem, I re-added the disk and simply "remember" which disk had the bad sectors.

    So this disk is the disk that has to be removed.



  • @DustinB3403 said:

    @coliver said:

    How did you figure out what drive it was in the array? Or did you pull them until you saw the one with that serial number?

    How do I know which disk it is?

    Well the other day I noticed that the array had a failed disk. Since I was rebuilding the system anyways I pulled each disk and performed a check disk from windows while checking for bad sectors.

    Only 1 disk was found with bad sectors.

    Knowing which disk this was, and windows saying it fixed the problem, I re-added the disk and simply "remember" which disk had the bad sectors.

    So this disk is the disk that has to be removed.

    Ok, so you wouldn't be able to figure this out from the Linux CLI you would have to have a record of all the serial numbers that are in each bay.


  • Banned

    @coliver Pretty much.

    Since there is no hot-swap function on my server (no indicator lights either) it's simply a matter of my knowing which disk is connected to which SATA port.


  • Banned

    So at this point I have the disk marked as failed, and removed from the array as shown below.

    0_1453994344578_XenCenterMain_2016-01-28_10-18-57.png

    As you can see sdc is not a part of the array at the moment, which means nothing will be written to the disk. Obviously I'm in a dangerous point in time.

    If I can't get my replacement disk soon, I risk losing the entire array.

    Now, because I've ready had issues with this array (specifically the disk) I have nothing running on this system that I don't have several backups of. So the drive has been ordered and will be here in a day or so.

    At which point I'll shutdown the server, remove the bad disk, and put the new one in.


  • Banned

    While I wait for that drive to arrive, I'm going to figure out how to configure email alerts for the mdadm array. Seeing as this would be incredibly useful to have.

    Since I can't sit here watching the cat /proc/mdstat.... 🙂



  • @DustinB3403 said:

    While I wait for that drive to arrive, I'm going to figure out how to configure email alerts for the mdadm array. Seeing as this would be incredibly useful to have.

    Since I can't sit here watching the cat /proc/mdstat.... 🙂

    No remote ssh access?


  • Banned

    @travisdh1 I do have access, but I'm still not going to sit here and watch it.


  • Banned

    So now that I have the email alerts configured for my Xen Servers, I really want to work on updating SmartCTL so it supports the drives that I have in this server.

    Which are pretty common drives.

    Western Digital Red 1TB.

    I'm really surprised how old of a database is built into XenServer 6.5.

    So time to figure this part out.



  • @DustinB3403 said:

    So now that I have the email alerts configured for my Xen Servers, I really want to work on updating SmartCTL so it supports the drives that I have in this server.

    Which are pretty common drives.

    Western Digital Red 1TD.

    I'm really surprised how old of a database is built into XenServer 6.5.

    So time to figure this part out.

    WTF is a TD?


  • Banned

    @JaredBusch said:

    @DustinB3403 said:

    So now that I have the email alerts configured for my Xen Servers, I really want to work on updating SmartCTL so it supports the drives that I have in this server.

    Which are pretty common drives.

    Western Digital Red 1TD.

    I'm really surprised how old of a database is built into XenServer 6.5.

    So time to figure this part out.

    WTF is a TD?

    That would be a typo' whoops.

    1TB.



  • @JaredBusch said:

    @DustinB3403 said:

    So now that I have the email alerts configured for my Xen Servers, I really want to work on updating SmartCTL so it supports the drives that I have in this server.

    Which are pretty common drives.

    Western Digital Red 1TD.

    I'm really surprised how old of a database is built into XenServer 6.5.

    So time to figure this part out.

    WTF is a TD?

    TeraDactyl, duh.

    It's the amount of storage taht canbe carried by an unladen teradactyl.



  • @scottalanmiller said:

    @JaredBusch said:

    @DustinB3403 said:

    So now that I have the email alerts configured for my Xen Servers, I really want to work on updating SmartCTL so it supports the drives that I have in this server.

    Which are pretty common drives.

    Western Digital Red 1TD.

    I'm really surprised how old of a database is built into XenServer 6.5.

    So time to figure this part out.

    WTF is a TD?

    TeraDactyl, duh.

    It's the amount of storage taht canbe carried by an unladen teradactyl.

    A Jurassic or Triassic TeraDactyl?



  • @JaredBusch said:

    @scottalanmiller said:

    @JaredBusch said:

    @DustinB3403 said:

    So now that I have the email alerts configured for my Xen Servers, I really want to work on updating SmartCTL so it supports the drives that I have in this server.

    Which are pretty common drives.

    Western Digital Red 1TD.

    I'm really surprised how old of a database is built into XenServer 6.5.

    So time to figure this part out.

    WTF is a TD?

    TeraDactyl, duh.

    It's the amount of storage taht canbe carried by an unladen teradactyl.

    A Jurassic or Triassic TeraDactyl?

    I... I don't know!



  • @scottalanmiller said:

    @JaredBusch said:

    @scottalanmiller said:

    @JaredBusch said:

    @DustinB3403 said:

    So now that I have the email alerts configured for my Xen Servers, I really want to work on updating SmartCTL so it supports the drives that I have in this server.

    Which are pretty common drives.

    Western Digital Red 1TD.

    I'm really surprised how old of a database is built into XenServer 6.5.

    So time to figure this part out.

    WTF is a TD?

    TeraDactyl, duh.

    It's the amount of storage taht canbe carried by an unladen teradactyl.

    A Jurassic or Triassic TeraDactyl?

    I... I don't know!

    img



  • This is a strange thread...

    0_1454011384339_image.jpg



  • @MattSpeller said:

    This is a strange thread...

    Right... lets not go there tis a silly thread.



  • @coliver silly english k-nig-hts



  • @scottalanmiller African or European?





  • @MattSpeller said in Replacing a Failed drive in MD RAID 10:

    @mazterjedi said in Replacing a Failed drive in MD RAID 10:

    @scottalanmiller African or European?

    31718148.jpg

    Stop taking my hand! At least @mazterjedi get's it. Quick, take my hand!