Xenvbd issues in Windows Event Viewer
-
Hello,
I ve got 2 Xenserver hosts, both of which are running 6.2. On my first day at this new job (back in end of Jan 2016), CEO told me that occasionally things will just freeze and nothing works. Well, this didnt happen for a long time once i started. Until Friday, when i noticed every server on XenServer host 1 was unresponsive for a few minutes. Once i was able to access the vms again, every single linux vm running on the host had gone into read only mode, and every windows vm on the host had many warnings from source Xenvbd, eventid 129, and source: disk eventid 153. Both of these warnings in Windows make me think the disk drivers on this xen host are out of date or incorrect. I did some more searching through event logs, and these warnings (misconfiguration somewhere) has been happening for years! Probably since day 1 when this company went to Xenserver (3-4 years ago probably.)
I am a bit new to XenServer, was wondering if any of you may know what is going on. I hope upgrading to 6.5 will fix this, as i plan on doing that at some point in the near future. -
Upgrading to 6.5 is good, and will probably help.
Let's look at the storage subsystem first tho. Sounds like something is wrong on the storage front. Do you have any errors in the storage subsystem? If local is an array rebuilding? That sounds like what happens when a local array rebuilds to me at least.
You also probably want to check that each of the guests has the XenServer Tools installed, which is the package that has any additional drivers the guests need.
-
I checked the storage array before posting, there is nothing to indicate there are any errors there. I also checked Xenserver Tools install on the servers which were affected, and all of the server that went unresponsive had the same warnings in windows even viewer. the one server that does not have xentools, has no errors or warnings during that time. The version of xenvbd SCSI Adapter is 7.0.0.120 on more than one of the servers with the problem.
this is what im looking at on one server
-
So, i have been dealing with this for awhile, ive got some more info, and a possible solution though i am just shooting in the dark here still i think.
when i run :
lsmod | grep 'iscsi'
iscsi_tcp 18333 20
libiscsi_tcp 21043 1 iscsi_tcp
libiscsi 53218 4 bnx2i,ib_iser,iscsi_tcp,libiscsi_tcp
scsi_transport_iscsi 77023 6 bnx2i,ib_iser,iscsi_tcp,libiscsi
scsi_mod 209749 18 bnx2i,ib_iser,iscsi_tcp,libiscsi,scsi_transport_iscsi,sr_mod,sg,isci,libsas,scsi_transport_sas,libata,scsi_dh_rdac,scsi_dh_hp_sw,scsi_dh_emc,scsi_dh_alua,scsi_dh,megaraid_sas,sd_moThis seems to show the use of broadcom drivers for my iscsi connections. This host has intel network interfaces. However, my other xenserver host(exact same hardware as problem host) lists the same output for lsmod | grep 'iscsi' and never has this issue.
If i use
lsmod -l | grep bnx2i
bnx2i 55493 0
cnic 77183 1 bnx2i
libiscsi 53218 4 bnx2i,ib_iser,iscsi_tcp,libiscsi_tcp
scsi_transport_iscsi 77023 6 bnx2i,ib_iser,iscsi_tcp,libiscsi
scsi_mod 209749 17 bnx2i,ib_iser,iscsi_tcp,libiscsi,scsi_transport_iscsi,sg,isci,libsas,scsi_transport_sas,libata,scsi_dh_rdac,scsi_dh_hp_sw,scsi_dh_emc,scsi_dh_alua,scsi_dh,megaraid_sas,sd_modNot sure why either of my Xenhosts (i took over this environment) are using iscsi drivers for nics they dont have installed. The Intel Gigabit driver is there, just not used for iscsi as far as i can tell.
[root@XS001 log]# lsmod | grep igb
igb 180177 0
[root@XS001 log]# modprobe -l | grep igb
/lib/modules/3.10.0+2/extra/igb.ko
[root@XS001 log]# ethtool -i eth0
driver: igb
version: 5.2.9.4
firmware-version: 1.61, 0x8000090e
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: noIf i am totally wrong about this please tell me...
-
Oh shoot, forgot to say i have upgraded my xenserver host to 6.5sp1 and all updates released until last week.
-
@momurda said in Xenvbd issues in Windows Event Viewer:
Oh shoot, forgot to say i have upgraded my xenserver host to 6.5sp1 and all updates released until last week.
Are you still having the lockup issue since upgrading?
I assume yes because of previous post, but wanted to be sure. -
Yes, randomly about once a day, all vms on this host will become unresponsive for about 5 minutes. Linux vms go into read only mode, and windows vms spit out this warning as well as some disk io operation warnings. Then it goes back to normal on windows vms, i just restart the linux vms to get them back to normal.
-
My proposed solution, comes from an old bug report from citrix' jira site.
https://bugs.xenserver.org/browse/XSO-241
Not specific to my hardware, but the citrix person here says:I also notice the broadcom cnic and bnx2i driver's are loading due to you using iscsi on your host.
Can you try running the following commands in dom0 (to disable those modules) and reboot the server:
mv /lib/modules/3.10.0+2/extra/bnx2i.ko /lib/modules/3.10.0+2/extra/bnx2i.bak
mv /lib/modules/3.10.0+2/extra/cnic.ko /lib/modules/3.10.0+2/extra/cnic.bak
mv /lib/modules/3.10.0+2/kernel/drivers/net/ethernet/broadcom/cnic.ko /lib/modules/3.10.0+2/kernel/drivers/net/ethernet/broadcom/cnic.bak
depmod -aThis seems to just rename the modules to .bak files so that they wont be recognized on boot. I assume then that igb would be loaded for iscsi communication and hopefully solve the issue. Not sure what risks there are, if this messes things up badly i could just rename the files again i think and be back to where i was.
-
where are you booting XS from? USB or SD card?
-
Dashrender's post got me thinking, because i really didnt know the answer, having not set any of this up. I rebooted the server Friday about noon, and it refused to POST. I then remembered this winter, about 4 days after i started here there was a power outage at the office, and that none of the servers were correctly hooked up to the ups' to shutdown gracefully.
Ok well, after some more testing Friday due to losing my pool master, i determined the bios on this mobo was bad, and i have replaced the mobo and now there seems to be no problems. I will keep tracking the Windows event log and see if the errors are gone or not over the next week or so.
This happened also when i upgraded the XenServer to 6.5sp1(a few reboots from 6.2). I realized that it kept getting stuck most of the time during post, and i would have to hold down power button til it turned off, then turn it back on, and normally it would boot correctly. The errors i were getting were quite strange, and some online resolutions from citrix or 'the internet' hinted at interrupt remapping being the issue for certain intel chipset. But i didnt have the affected chipset.Still mystified about XenServer using the broadcom network drivers with Intel nics though. I will dig into this more later on; just glad to have 2 hosts again now instead of 1.
-
Awesome, good progress, at least.