Ubuntu Systemd Bad Entry

Alex Sage

This post is deleted!

DustinB3403

All disks in the array appear to be fine according to MD.... So this is clearly this is something with the VM.

DustinB3403

So I was able to just restore this VM to a snapshot from the other day.

Should I perform another fsck on this virtual system?

scottalanmiller

Not if it does not prompt you to.

DustinB3403

So how can I check to see if whatever caused this issue is still present? I mean if it just happens from time to time, fine.

But wouldn't it be good to know what caused it?

scottalanmiller

@DustinB3403 said in Ubuntu Systemd Bad Entry:

But wouldn't it be good to know what caused it?

That's a common thought and it makes sense, kind of. But computers are ridiculously complex beasts and not all issues are replicable. Gamma radiation, insanely uncommon bugs, memory errors, CPU errors, disk errors and such can all lead to corruption. These things happen. If you want to investigate every possible error ever you can easily spend more than the system is worth and only "guess" at the problem in the end - all for something that is unlikely to ever happen again.

Think of a windshield and you get a crack in it. You don't remember something hitting your windshield. Do you stop driving and spend months doing forensics trying to determine if it was a rock, bird, bug, bridge debris, glass fragility, bizarre temperature change, etc. that caused it to crack? Would knowing be useful? Not if it doesn't happen again.

So yes, KNOWING would be great. But FINDING OUT is not. Make sense? The cost required to know isn't worth it unless it becomes a repeating problem.

DustinB3403

This issue is still occurring and interrupting my backup schedule for my VM's.

The host appears to be fine. So either I have to build a new VM, or something is wrong with the host.

Guest

DustinB3403

So doing to a smartctl on the host it appears that /dev/sdb does have several errors. I'll be replacing this drive today and see if the issue persist.

The other 3 disks have no smart errors at all.

DustinB3403

essentially this one disk is in a pre-failed state due to age.

So performing

mdadm /dev/md0 --fail /dev/sdb --remove /dev/sdb

and then replacing this disk I should be in a good state.

DustinB3403

And the array is resilvering the now replaced disk.

As an FYI for anyone on software RAID, the drives are organized in a manner that aligns to the SATA connections on the board.

IE : USB boot device is SDA

SATA1 (or 0 however it is labeled) = SDB
SATA2 = SDC
and so on.

DustinB3403

Well at least for now, the I/O errors have stopped after I replaced the bad disk in the host array and reverted the VM.

I'll keep an eye on it and report back if the issue comes back.

DustinB3403

And these are back.

scottalanmiller

Same disk?

DustinB3403

On the guest OS there is only 1 disk (it's presented from the array).

I checked the smart stats on each drive and found no issues. MD was also fine.

DustinB3403

DustinB3403

DustinB3403

Although on a separate note..... this is a shit desktop... that is acting as the hypervisor... so errors really shouldn't surprise me.

DustinB3403

All disks in the host are marked as "Old_age" so there really isn't much that I can do besides assemble something else.

Hopefully out of newer equipment.

DustinB3403

Maybe I'm reading this wrong.. Smart overall-health self-assessment is passed on all drives.

So maybe it's just a column saying "it'll fail here"

DustinB3403

I'm performing long tests on each of the drives to confirm the information I have is accurate. I'll update in ~90 minutes.