Ubuntu Systemd Bad Entry
-
it looks like xvda is having issues according the current screen.
Might have to replace that drive...
-
At the moment the system appears to just be progressing through the blk_update_request with I/O errors for individual sectors on XVDA.
Should I abort this operation and find a replacement drive? Is it worth it to let this continue?
-
@DustinB3403 said in Ubuntu Systemd Bad Entry:
At the moment the system appears to just be progressing through the blk_update_request with I/O errors for individual sectors on XVDA.
Should I abort this operation and find a replacement drive? Is it worth it to let this continue?
Hard to say. Real data on it? Would try to get a last backup first before doing filesystem operations.
-
Yeah, all comes down to the value of recovery, really.
-
I don't mind tearing down the system, it's only running 1 VM that I'm backing up my VM's too. Which those delta's get pushed off nightly to another disk.
-
Time to reboot
-
And the system is in recovery mode. ..
-
Manual fsck is no fun.
-
At least all of the instructions are there, and this is a learning experience.
-
This post is deleted! -
All disks in the array appear to be fine according to MD.... So this is clearly this is something with the VM.
-
So I was able to just restore this VM to a snapshot from the other day.
Should I perform another fsck on this virtual system?
-
Not if it does not prompt you to.
-
So how can I check to see if whatever caused this issue is still present? I mean if it just happens from time to time, fine.
But wouldn't it be good to know what caused it?
-
@DustinB3403 said in Ubuntu Systemd Bad Entry:
But wouldn't it be good to know what caused it?
That's a common thought and it makes sense, kind of. But computers are ridiculously complex beasts and not all issues are replicable. Gamma radiation, insanely uncommon bugs, memory errors, CPU errors, disk errors and such can all lead to corruption. These things happen. If you want to investigate every possible error ever you can easily spend more than the system is worth and only "guess" at the problem in the end - all for something that is unlikely to ever happen again.
Think of a windshield and you get a crack in it. You don't remember something hitting your windshield. Do you stop driving and spend months doing forensics trying to determine if it was a rock, bird, bug, bridge debris, glass fragility, bizarre temperature change, etc. that caused it to crack? Would knowing be useful? Not if it doesn't happen again.
So yes, KNOWING would be great. But FINDING OUT is not. Make sense? The cost required to know isn't worth it unless it becomes a repeating problem.
-
This issue is still occurring and interrupting my backup schedule for my VM's.
The host appears to be fine. So either I have to build a new VM, or something is wrong with the host.
Guest
-
So doing to a smartctl on the host it appears that /dev/sdb does have several errors. I'll be replacing this drive today and see if the issue persist.
The other 3 disks have no smart errors at all.
-
essentially this one disk is in a pre-failed state due to age.
So performing
mdadm /dev/md0 --fail /dev/sdb --remove /dev/sdb
and then replacing this disk I should be in a good state.
-
And the array is resilvering the now replaced disk.
As an FYI for anyone on software RAID, the drives are organized in a manner that aligns to the SATA connections on the board.
IE : USB boot device is SDA
SATA1 (or 0 however it is labeled) = SDB
SATA2 = SDC
and so on. -
Well at least for now, the I/O errors have stopped after I replaced the bad disk in the host array and reverted the VM.
I'll keep an eye on it and report back if the issue comes back.