Ubuntu Systemd Bad Entry

DustinB3403

it looks like xvda is having issues according the current screen.

Might have to replace that drive...

DustinB3403

At the moment the system appears to just be progressing through the blk_update_request with I/O errors for individual sectors on XVDA.

Should I abort this operation and find a replacement drive? Is it worth it to let this continue?

thwr

@DustinB3403 said in Ubuntu Systemd Bad Entry:

At the moment the system appears to just be progressing through the blk_update_request with I/O errors for individual sectors on XVDA.

Should I abort this operation and find a replacement drive? Is it worth it to let this continue?

Hard to say. Real data on it? Would try to get a last backup first before doing filesystem operations.

scottalanmiller

Yeah, all comes down to the value of recovery, really.

DustinB3403

I don't mind tearing down the system, it's only running 1 VM that I'm backing up my VM's too. Which those delta's get pushed off nightly to another disk.

DustinB3403

Time to reboot

DustinB3403

And the system is in recovery mode. ..

scottalanmiller

Manual fsck is no fun.

DustinB3403

At least all of the instructions are there, and this is a learning experience.

Alex Sage

This post is deleted!

DustinB3403

All disks in the array appear to be fine according to MD.... So this is clearly this is something with the VM.

DustinB3403

So I was able to just restore this VM to a snapshot from the other day.

Should I perform another fsck on this virtual system?

scottalanmiller

Not if it does not prompt you to.

DustinB3403

So how can I check to see if whatever caused this issue is still present? I mean if it just happens from time to time, fine.

But wouldn't it be good to know what caused it?

scottalanmiller

@DustinB3403 said in Ubuntu Systemd Bad Entry:

But wouldn't it be good to know what caused it?

That's a common thought and it makes sense, kind of. But computers are ridiculously complex beasts and not all issues are replicable. Gamma radiation, insanely uncommon bugs, memory errors, CPU errors, disk errors and such can all lead to corruption. These things happen. If you want to investigate every possible error ever you can easily spend more than the system is worth and only "guess" at the problem in the end - all for something that is unlikely to ever happen again.

Think of a windshield and you get a crack in it. You don't remember something hitting your windshield. Do you stop driving and spend months doing forensics trying to determine if it was a rock, bird, bug, bridge debris, glass fragility, bizarre temperature change, etc. that caused it to crack? Would knowing be useful? Not if it doesn't happen again.

So yes, KNOWING would be great. But FINDING OUT is not. Make sense? The cost required to know isn't worth it unless it becomes a repeating problem.

DustinB3403

This issue is still occurring and interrupting my backup schedule for my VM's.

The host appears to be fine. So either I have to build a new VM, or something is wrong with the host.

Guest

DustinB3403

So doing to a smartctl on the host it appears that /dev/sdb does have several errors. I'll be replacing this drive today and see if the issue persist.

The other 3 disks have no smart errors at all.

DustinB3403

essentially this one disk is in a pre-failed state due to age.

So performing

mdadm /dev/md0 --fail /dev/sdb --remove /dev/sdb

and then replacing this disk I should be in a good state.

DustinB3403

And the array is resilvering the now replaced disk.

As an FYI for anyone on software RAID, the drives are organized in a manner that aligns to the SATA connections on the board.

IE : USB boot device is SDA

SATA1 (or 0 however it is labeled) = SDB
SATA2 = SDC
and so on.

DustinB3403

Well at least for now, the I/O errors have stopped after I replaced the bad disk in the host array and reverted the VM.

I'll keep an eye on it and report back if the issue comes back.