vMotion causing glitches on moved machines
-
I'm posting in the VMWare forums as well but curious to know why when I move a machine live via vmotion it causes the machine to get a corrupt file system. Usually in /dev/sda1/ on a linux box but has also happened on windows boxes as well.
-
Can you give more information about the version of vSphere you are running, the type of storage you have, and any errors you may see in your vCenter event logs area?
And this is vMotion (changing host used for compute and memory) and not storage vMotion that is producing the corruption, right?
-
Vsphere ESXi Essentials Plus 5.5
3 hosts are all HP Proliant gen 9 servers. Storage is Netgear ReadyNAS 3312.
Nothing errors out when moving from one host to another. Nothing in the logs that I can see but I could be looking in the wrong place.
-
@WLS-ITGuy said in vMotion causing glitches on moved machines:
3 hosts are all HP Proliant gen 9 servers. Storage is Netgear ReadyNAS 3312.
WAT
-
I summon the IPOD executioner - @scottalanmiller
-
@WLS-ITGuy said in vMotion causing glitches on moved machines:
Vsphere ESXi Essentials Plus 5.5
3 hosts are all HP Proliant gen 9 servers. Storage is Netgear ReadyNAS 3312.
Nothing errors out when moving from one host to another. Nothing in the logs that I can see but I could be looking in the wrong place.
That Netgear ReadyNAS is probably a good device for backup. As main storage for a cluster, not so much. Is this a lab?
-
That setup is terrible, but it shouldn't be the cause of any issues here. vMotion should not cause corruption even with a highly risky setup like that.
-
I wonder if we should tag @John-Nicholson ?
-
The ReadyNAS is set up as iSCSI and is certified to work with vSphere. Why is it risky?
-
@WLS-ITGuy said in vMotion causing glitches on moved machines:
The ReadyNAS is set up as iSCSI and is certified to work with vSphere. Why is it risky?
But you have an inverted pyramid setup.
Three servers going to (we hope) two switches going to one SAN.
If the SAN fails the whole thing fails.
Instead of making your system safer, it's actually, risk wise, noticeably less safe.
-
@WLS-ITGuy The issue that @Dashrender and @scottalanmiller have posted about are regarding the overall system design. Which isn't related (as far as we can tell) the root of your problem.
Since your VM's are using shared non-redundant storage the issue has to lye somewhere else.
vSphere 5.5 is quite out of date, and could be the cause of the problem.
-
Have you run an fsck on the drives/partitions and have it find something wrong?
-
@WLS-ITGuy said in vMotion causing glitches on moved machines:
The ReadyNAS is set up as iSCSI and is certified to work with vSphere. Why is it risky?
None of the "pieces" are risky, it's the fundamental design that you have. Would you buy the biggest, baddest server and then run it without RAID on a single consumer hard drive? That's what you have here - loads of high end parts and protection all resting on a single point of failure that should never be used in this way.
Certified means "tested as compatible" and in no way tells you it is safe to use or less.... that it is safe to use as you have used it. Your setup is significantly more costly but less safe than just running a single server.
-
-
Doesn't vMotion move the memory state? Should not corrupt as the VM never stops, I thought.
-
@Reid-Cooper said in vMotion causing glitches on moved machines:
Doesn't vMotion move the memory state? Should not corrupt as the VM never stops, I thought.
Yeah, that's why I'm wondering if this is something corrupt in the inodes or the like. Moving the VM between hosts shouldn't effect that at all.
-
I agree, the memory should be protecting it from corruption. Not sure what stage would expose the storage here. Possible it is a bug, but that's unlikely in VMware.
-
I can tell you that we are on the latest build version of 5.5. I run the move from the web client and I select the reserve for optimal performance on the migration. I am just moving the VM from one host to another.
-
@WLS-ITGuy said in vMotion causing glitches on moved machines:
I can tell you that we are on the latest build version of 5.5. I run the move from the web client and I select the reserve for optimal performance on the migration. I am just moving the VM from one host to another.
Have you run an fsck before doing the vmotion? I'm wondering if it's something pre-existing that just isn't caught.
-
@travisdh1 I did one after because it required it to get it working again. I cannot recall if I have done one since doing the fsck.