Fixing Software RAID on XenServer



  • So I had XenServer 6.5 go south (kernel panic) after installing SP1. I've used the 6.5 disc to "upgrade" so that I now have a functional host. The side effect of this is that dom0 does not see the RAID10 now, and I'm looking for some help to do what needs to be done to remount md0 without any data loss (if possible).

    Here is some of the output from various configs:

    mdadm --examine /dev/sdb (first drive in array)
    /dev/sdb:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x0
    Array UUID : 3e6d21f5:0e381299:a38fdae2:571cbed3
    Name : <machinename>:0 (local to host <machinename>)
    Creation Time : Fri Feb 5 13:27:48 2016
    Raid Level : raid10
    Raid Devices : 4

    Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
    Array Size : 7813774336 (7451.80 GiB 8001.30 GB)
    Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
    Super Offset : 8 sectors
    State : active
    Device UUID : b66b76cf:0bcf1d52:37ad3de5:911eb41a
    Update Time : Wed Mar 30 09:52:03 2016
    Checksum : b5080968 - correct
    Events : 2943
    Layout : near=2
    Chunk Size : 512K

    Device Role : Active device 2
    Array State : AAAA ('A' == active, '.' == missing)

    I don't know how to fix this, but there isn't a /etc/mdadm.conf file any more.
    It looks to me like the array is fine, but there isn't an md0. Any thoughts or questions?



  • You're running a MDADM RAID 10, yes?





  • This is why you replicate your Boot devices to a backup.

    Something I plan on doing this weekend (to avoid this very same issue)



  • As for trying to recover, can you not mount the array in XenServer without it wiping the array?



  • @DustinB3403 said:

    As for trying to recover, can you not mount the array in XenServer without it wiping the array?

    I can't access the array in dom0 right now.

    I tried doing mdadm --assemble /dev/md0 /dev/sd[bcef]. It says that md0 already exists. When I run cat /proc/mdstat it shows md0 as inactive with only sdb as a member.



  • This link has lots of good info on MD RAID recovery.

    https://raid.wiki.kernel.org/index.php/RAID_Recovery



  • So the RAID is rebuilding. I ended up having to run this to get the drives reconnected to each other and dom0:

    mdadm --create /dev/md0 /dev/sd[bcef]
    

    It is still resyncing, so I'm not sure if it worked or not, but md0 appears to be happy.



  • @Kelly said:

    So the RAID is rebuilding. I ended up having to run this to get the drives reconnected to each other and dom0:

    mdadm --create /dev/md0 /dev/sd[bcef]
    

    It is still resyncing, so I'm not sure if it worked or not, but md0 appears to be happy.

    It should be fine. Just assuming that none of the drives got mounted and written too, and even if 1 did, the RAID should recover from a single corrupt drive.



  • @Dashrender said:

    You're running a MDADM RAID 10, yes?

    MD RAID. MDADM is the administration utility for MD RAID. There is no such thing as MDADM RAID.



  • Did it rebuild successfully?



  • @scottalanmiller said:

    Did it rebuild successfully?

    I don't know. I ended up setting up the necessary VMs on other hosts and started rebuilding this one since it has been having stability issues in the last few weeks.



  • @Kelly said:

    @scottalanmiller said:

    Did it rebuild successfully?

    I don't know. I ended up setting up the necessary VMs on other hosts and started rebuilding this one since it has been having stability issues in the last few weeks.

    Ah okay, probably best but it is nice when you can figure out what happened, just to know what it was 😞