XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!
-
I've inherited a XenServer 6.2 environment. My background is in Hyper-V. I understand some concepts, but I have not used these, or the Dell PowerVault SAN that it's attached to via iSCSI, so bear with me if I am lacking in background.
Current Status - XenPool with two hosts, Xen1 and Xen2 (Pool Master). VMs show as running, but can't interact with them in any way. All forms of storage are disconnected from the hosts. When I ran iscsiadmn node and session, from the console, Xen2 showed as being connected to the SAN. However, I cannot connect either host to their local storage. Both stuck in maintenance mode.
Background, and of course, I can flesh it out more, as questions arise. I'm going on 26 hours of work, -4 hours of sleep. Forgive any holes. I walked in yesterday to do maintenance when no one was here. I noticed I couldn't RDP into the servers that were managed by Xen1. I logged in, and not being familiar enough, I didn't notice if the storage connectivity was an issue right away. At the time, it seemed like there were networking issues. I couldn't force down any VMs, I couldn't move any VMs to Xen2. Tried to reboot the host, couldn't reboot it from XenCenter. Eventually had to force the server down. Over time, I saw that there was a with the control domain getting full, at 91%. Doing research, I cleared out some log files to get it under 3GB.
I repeatedly had issues with Xen1 going into maintenance mode after a reboot, and not being able to exit. Only resolved when I changed anything about its network settings. Eventually, I was commanded to restart both servers. After they came back up, Xen2 also lost all connection to its storage, and is now stuck in maintenance mode. Which means we have no servers at all, including Active Directory. I have internet by pointing DNS to my router.
Help!
-
When I started this, the network IPs for the hosts were .104 and .105 for Xen1 and Xen2, respectively. When Xen1 wouldn't get out of maintenance mode, changing that to .106 or vice-versa got it back in. I just got Xen1 out of maintenance mode by doing that. Xen2 is still in maintenance mode.
-
Can you get to the console of the SAN (PowerVault) to see what it thinks is going on? Is it healthy itself and this is just a XenServer issue... or has storage failed?
-
Given that both hosts are having a problem, and given that the PowerVault is normally the weakest link in the chain and most likely to fail, there is a good chance that that is where the problem lies. That's the shared component. If only one XenServer node was having issues we'd not think that the SAN was the issue. But both at the same time? SAN is the obvious culprit.
-
@scottalanmiller I had gone to another user's machine to log into the PowerVault interface last night. I'm installing the software on my machine now.
-
Also, we're willing to engage someone formally to get this resolved.
-
@CitrixNewbJD said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
@scottalanmiller I had gone to another user's machine to log into the PowerVault interface last night. I'm installing the software on my machine now.
That'll likely be the most telling.
-
@CitrixNewbJD said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
Also, we're willing to engage someone formally to get this resolved.
Let's see if we can't get to the bottom of it quickly, for free first But that's definitely an option.
-
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
Can you get to the console of the SAN (PowerVault) to see what it thinks is going on? Is it healthy itself and this is just a XenServer issue... or has storage failed?
To this question, Xen2 was working fine, and the VMs hosted by Xen2 were working fine for most of yesterday, until I was commanded to shut everything down and bring it back up. It's not just the iSCSI storage that's unplugged. Local storage is also unplugged, as is DVD Drives and Removable Storage. See attached screenshot.
-
Oh okay, that doesn't sound like a SAN problem then.
-
I am logged into the SAN. I can see easily that there's been a virtual disk failure, but that's been there for awhile. I'm not hugely familiar with this interface, but nothing immediately pops out as an issue.
-
Can you tell what has changed between working conditions and now? I know you said that you're at 6.2, but has there been any updates done?
Also, would it be possible to backup the configs for the hosts and reinstall XS from scratch, then restore configs?
-
Also, have you tried shutting both down and bringing up Xen2, since its the pool master? Maybe Xen1 is trying to take over, thinking that Xen2 is down.
-
@NerdyDad said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
Can you tell what has changed between working conditions and now? I know you said that you're at 6.2, but has there been any updates done?
Also, would it be possible to backup the configs for the hosts and reinstall XS from scratch, then restore configs?
@NerdyDad To my knowledge, from the time it was working, last Thursday/Friday, until I walked in the door yesterday morning and found issues, there had been no changes. I'm positive no updates have been done.
With regards to the backups, I'm sure I could try that. Do you have a link handy with a procedure for that, by any chance?
@NerdyDad said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
Also, have you tried shutting both down and bringing up Xen2, since its the pool master? Maybe Xen1 is trying to take over, thinking that Xen2 is down.
That's the last thing I'd tried before posting here. I couldn't bring Xen2 out of maintenance mode, nor reconnect storage.
-
The network issue concerns me the most. That's going to cause everything to fail on its own.
-
Can you ping the SAN from either XS?
-
I think that we need to look at the logs. Do you have access to get onto the machines themselves via SSH?
-
@CitrixNewbJD said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
@NerdyDad said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
Can you tell what has changed between working conditions and now? I know you said that you're at 6.2, but has there been any updates done?
Also, would it be possible to backup the configs for the hosts and reinstall XS from scratch, then restore configs?
@NerdyDad To my knowledge, from the time it was working, last Thursday/Friday, until I walked in the door yesterday morning and found issues, there had been no changes. I'm positive no updates have been done.
With regards to the backups, I'm sure I could try that. Do you have a link handy with a procedure for that, by any chance?
@NerdyDad said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
Also, have you tried shutting both down and bringing up Xen2, since its the pool master? Maybe Xen1 is trying to take over, thinking that Xen2 is down.
That's the last thing I'd tried before posting here. I couldn't bring Xen2 out of maintenance mode, nor reconnect storage.
-
@Dashrender said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
Can you ping the SAN from either XS?
I can from Xen2, SAN IPs of .100 and .101 on the mgmt network. I have console control of Xen2 from XenCenter, but Xen1 requires me to run up to the server room. Bear with me.
-
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
I think that we need to look at the logs. Do you have access to get onto the machines themselves via SSH?
I haven't tried yet. Let me get a hold of putty.