XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!
-
Having been through this once before, and learning the hard way, I do normally have a physical DC. Despite my warnings, because I know that we do not currently have one here, I was told to bring it all down. And here we are. We do not have a physical DC.
-
I've been using the root authentication for everything.
-
Frank, can we speak on the phone for a minute so I can be sure I can intelligently talk to the guy when demanding his money?
-
@CitrixNewbJD said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
Having been through this once before, and learning the hard way, I do normally have a physical DC.
This is absolutely the wrong response. You should never have a physical DC, ever. There is zero issues here with virtualization. There are two problems....
- Zero AD redundancy
- An inverted pyramid of doom (single storage for all systems)
Fixing either of those anti-practices would have saved you. Physical would have zero benefit and is the polar opposite of the reaction that you should have.
-
@CitrixNewbJD said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
I've been using the root authentication for everything.
So we are safe there.
-
More on the IPOD: http://www.smbitjournal.com/2013/06/the-inverted-pyramid-of-doom/
And in video form from MangoCon:
-
So, when looking for places to turn off AD integration, I see this...
-
It's not pool integration that is the issue, it's SAN integration. Check the SAN (PowerVault) interface instead.
-
@seal Just came across these two items on the SAN interface. Dental_Data, Spindlemedia, are critical and it looks like those VDs failed.
PROFILE FOR STORAGE ARRAY: MDS-Spindle01 (12/27/16 3:28:58 PM) STANDARD VIRTUAL DISKS------------------------------ SUMMARY Number of standard virtual disks: 3 See other Virtual Disks sub-tabs for premium feature information. NAME STATUS CAPACITY RAID LEVEL DISK GROUP DRIVE TYPE Dental_Data Failed 1.495 TB 5 0 SAS SpindleMedia Failed 2.862 TB 5 0 SAS Virtual Failed 1.367 TB 5 0 SAS DETAILS Virtual Disk name: Dental_Data Virtual Disk status: Failed Capacity: 1.495 TB Virtual Disk world-wide identifier: 60:02:4e:80:00:7b:78:6a:00:00:04:13:4a:96:70:f3 Subsystem ID (SSID): 1 Associated disk group: 0 RAID level: 5 Physical Disk type: Serial Attached SCSI (SAS) Enclosure loss protection: No Preferred owner: RAID Controller Module in slot 1 Current owner: RAID Controller Module in slot 1 Segment size: 128 KB Capacity reserved for future segment size changes: Yes Maximum future segment size: 2,048 KB Modification priority: High Read cache: Enabled Write cache: Enabled Write cache without batteries: Disabled Write cache with mirroring: Enabled Flush write cache after (in seconds): 10.00 Dynamic cache read prefetch: Enabled Enable background media scan: Enabled Media scan with consistency check: Enabled Pre-Read consistency check: Disabled Virtual Disk name: SpindleMedia Virtual Disk status: Failed Capacity: 2.862 TB Virtual Disk world-wide identifier: 60:02:4e:80:00:70:ed:06:00:00:07:f5:4d:ba:7b:fb Subsystem ID (SSID): 2 Associated disk group: 0 RAID level: 5 Physical Disk type: Serial Attached SCSI (SAS) Enclosure loss protection: No Preferred owner: RAID Controller Module in slot 0 Current owner: RAID Controller Module in slot 1 Segment size: 128 KB Capacity reserved for future segment size changes: Yes Maximum future segment size: 2,048 KB Modification priority: High Read cache: Enabled Write cache: Enabled Write cache without batteries: Disabled Write cache with mirroring: Enabled Flush write cache after (in seconds): 10.00 Dynamic cache read prefetch: Enabled Enable background media scan: Enabled Media scan with consistency check: Enabled Pre-Read consistency check: Disabled Virtual Disk name: Virtual Virtual Disk status: Failed Capacity: 1.367 TB Virtual Disk world-wide identifier: 60:02:4e:80:00:70:ed:06:00:00:04:31:4a:96:73:09 Subsystem ID (SSID): 0 Associated disk group: 0 RAID level: 5 Physical Disk type: Serial Attached SCSI (SAS) Enclosure loss protection: No Preferred owner: RAID Controller Module in slot 0 Current owner: RAID Controller Module in slot 1 Segment size: 128 KB Capacity reserved for future segment size changes: Yes Maximum future segment size: 2,048 KB Modification priority: High Read cache: Enabled Write cache: Enabled Write cache without batteries: Disabled Write cache with mirroring: Enabled Flush write cache after (in seconds): 10.00 Dynamic cache read prefetch: Enabled Enable background media scan: Enabled Media scan with consistency check: Enabled Pre-Read consistency check: Disabled
-
Oh look, on top of everything else, they left you with RAID 5, too. Figures. Whoever set this up really set you up for failure.
-
Your predecessor definitely pulled this on you: https://mangolassi.it/topic/11852/why-it-builds-a-house-of-cards
-
Looks like, on top of other problems, the SAN has died. It's hard to tell from this, but it looks like those are the LUNs that hold all of your VMs?
-
So 2 drives failed at once? You should be able to go into the server room and see some sort of blinky light pattern that indicates what/how many drives are gone.
Did you lose a RAID Controller? -
Dear God I pray that you have backups outside of the environment. Please tell me that you do. Another NAS, tapes, diskettes, something?
-
@momurda said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
So 2 drives failed at once? You should be able to go into the server room and see some sort of blinky light pattern that indicates what/how many drives are gone.
Did you lose a RAID Controller?It's a dual controller device. So in theory it should fail over. But in reality, they rarely do.
-
@NerdyDad said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
Dear God I pray that you have backups outside of the environment. Please tell me that you do. Another NAS, tapes, diskettes, something?
At this point, recovering from backup to a new cluster might be the best way to go. The SAN is worthless if the arrays have failed. And the local servers probably don't have the necessary storage to run without it. If the array is really lost, the old hardware has probably dropped to a zero value level. Time to get something new in and recover to that ASAP.
-
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
@momurda said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
So 2 drives failed at once? You should be able to go into the server room and see some sort of blinky light pattern that indicates what/how many drives are gone.
Did you lose a RAID Controller?It's a dual controller device. So in theory it should fail over. But in reality, they rarely do.
But if drives are lost, that won't help.
-
Isn't this saying the virtual drives for each failed? This should be different than a physical drive failure, right? Or am I reading something wrong?
-
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
@CitrixNewbJD said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
Having been through this once before, and learning the hard way, I do normally have a physical DC.
This is absolutely the wrong response. You should never have a physical DC, ever. There is zero issues here with virtualization. There are two problems....
- Zero AD redundancy
- An inverted pyramid of doom (single storage for all systems)
Fixing either of those anti-practices would have saved you. Physical would have zero benefit and is the polar opposite of the reaction that you should have.
having a physical in this situation would have probably saved him. That said, I agree it's not the solution. If you really wanted to have a DC outside this cluster, fine, but you still virtualize that third server, then install a DC on that.
-
@seal said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
Isn't this saying the virtual drives for each failed? This should be different than a physical drive failure, right? Or am I reading something wrong?
Well, yes and no. You are correct. The warning is that the LDs have failed. But the LDs fail when their underlying array fails. That underlying array is built on physical drives. So for the LDs to fail, it means that the array(s) that they share has failed, which means that the drives it has in its pool have failed. Or that both controllers have failed. In this case, since two utility LUNs are still hanging around, we are guessing that the controller(s) are intact and only the array has failed.