RHEL 4 not seeing ext3 label

JaredBusch

Continues from this original thread
Re: PCI bus error

Getting a kernel panic no matter where I try to restore.
Lots of test and I am seeing that the boot process is unable to see the filesystem label to mount root.

@DustinB3403 found this which got me searching in the right place.

@DustinB3403 said in PCI bus error:

@JaredBusch I'm assuming you already found this, but if not.

https://communities.vmware.com/t5/Converter-Standalone-Discussions/RHEL-4-machine-converted-fine-but-get-kernel-panic-when-I-start/m-p/2148049

But I cannot get anything working.

I am assuming that it is a driver built in to the kernel that is missing.
This seems to be the initial failure

GRUB is telling it to look for that.

The original system has that label on sda3

I can boot to a CentOS 4 ISO file and enter rescue mode and check the label. it is listed.

So frustrated.
I have tried many combinations of the drive choices and such within Proxmox. I restored even to bare metal on an old desktop, etc, etc.

JaredBusch

and solved it. finally..

one of the reboots into the CentOS 4 disc was slow or something and I caught it pop this screen (took me 6 reboots to get screenshot).
This is the Kudzu hardware detection thing.

The VM is setup using the LSI 53C895A SCSI Controller.
Booted into rescue mode with the CentOS 4 CD.

chroot /mnt/sysimage
vi /etc/modprobe.conf
# make this the only scsi_hostapadter
alias scsi_hostadapter sym53c8xx
# exit vi !!! omfg how!!!
cd /boot
mkinitrd -v -f initrd-2.6.9-55.EL.img 2.6.9-55.EL
exit
exit

System automatically reboots to come out of rescue mode.
Make sure you remove the ISO at this point.
Then boom..

1337

You could potentially try to install centos4, or rhel4 is even better, so you get to a bootable system.
Then just copy the files from the backup over your installation.

JaredBusch

@Pete-S said in RHEL 4 not seeing ext3 label:

You could potentially try to install centos4, or rhel4 is even better, so you get to a bootable system.
Then just copy the files from the backup over your installation.

It is an option I have thought about. I'll be on site this morning, and I will be shutting down the host and booting to a Fedora Live to run dd in an effort to get a solid disk image.

I tried their built in process on Tuesday and it failed with sector/block read errors. A little digging through the files on the recovery ISO showed that all they were doing was using dd, so I am hoping to use dd with more intelligent options to continue on and such.

DustinB3403

So this is quite old (15 years) but maybe... Link

Sounds like you
lost the label on your boot partition.
Boot from CD into rescue mode and use e2fslabel to label the partition or
change the root= to use /dev/hdx in grub.

Dashrender

@JaredBusch said in RHEL 4 not seeing ext3 label:

@Pete-S said in RHEL 4 not seeing ext3 label:

You could potentially try to install centos4, or rhel4 is even better, so you get to a bootable system.
Then just copy the files from the backup over your installation.

It is an option I have thought about. I'll be on site this morning, and I will be shutting down the host and booting to a Fedora Live to run dd in an effort to get a solid disk image.

I tried their built in process on Tuesday and it failed with sector/block read errors. A little digging through the files on the recovery ISO showed that all they were doing was using dd, so I am hoping to use dd with more intelligent options to continue on and such.

What's repair solution for bad blocks in a setup like this? If dd can't read because of bad blocks, I'm hoping 'nix has some tool to fix/recover/replace these bad blocks, assuming the data's recoverable on the hardware, otherwise it's a restore time, right?

JaredBusch

@DustinB3403 said in RHEL 4 not seeing ext3 label:

Boot from CD into rescue mode and use e2fslabel to label the partition or

I stated in the OP that booting into rescue mode, the label is showing correct.

JaredBusch

@DustinB3403 said in RHEL 4 not seeing ext3 label:

change the root= to use /dev/hdx in grub.

I did that also. It still failed to mount it.

JaredBusch

This is the script that performs the backup itself. Well the chunk that does a backup to HDD

backup2hd()
{
	echo "Backup to HD started..."
	
	AUTOBACKUP=$1
	AUTO=0
	RES=0
	if [ "${AUTOBACKUP}" = "AUTO" ]; then
		RES=0
		AUTO=1
		echo "Auto Full Backup Starts..."
		# mt rewind
	else
		RES=2
		AUTO=0
	fi
	#TODO: Mount check - can't backup to a non-existant or read-only mount point 
	RES=0 # Assume all is well - really the mount check would reset this, but until then just "go with it"


	# Make temp directory...
	# TDR_ROOT is the base directory we are going to use on the mounted volume (e.g. /media/usbdisk)
	TMP_TDR=${TDR_ROOT}/tmp/TDR-backup
	mkdir -p $TMP_TDR
	rm -rf $TMP_DIR


	# Size sanity check - can't backup to a device too small.
	# -- Exclusion HD list
	mkdir -p $TMP_TDR/hd
	for HD in $HD_EXCLUDE 
	do 	
		mkdir -p $(dirname $TMP_TDR/hd/$HD) # Account for device names like /dev/cciss/c0d0p1
		touch $TMP_TDR/hd/$HD 
	done

	dialog --title "BackupHD" --defaultno --yesno "Skip size check?" 5 30 	
	if [ $? -eq 1 ]; then 
		# - Find total size of backup 
		for HD in $(dmesg | grep -P "^\s+\S+:\s+\S+\d+" | grep -P "(\d+|>)$" | cut -d':' -f1 | sed 's/ //g')
		do 
			if [ ! -f $TMP_TDR/hd/$HD ]; then
				mkdir -p $(dirname $TMP_TDR/hd/$HD)  # Account for device names like /dev/cciss/c0d0p1
				touch $TMP_TDR/hd/$HD 
				unset TOTALSIZE
				unset SIZE
				for PART in $(sfdisk -l /dev/$HD | grep -P "Linux$" | cut -d' ' -f1 )
				do 	
					echo "Checking $PART size..."
					SIZE=$(dump -S $PART )
					TOTALSIZE=$(($TOTALSIZE + $SIZE	))
					echo "$PART is $SIZE bytes"
				done
			fi
		done
		rm -rf $TMP_TDR/hd/
		# Find device mounted on TDR_ROOT 
		TARGETSIZE=$(df $TDR_ROOT| tail -n 1 | awk '{print $4}' )
		TARGETSIZE=$(( $TARGETSIZE * 1024 ))  # Convert to bytes
		if [ $TOTALSIZE -gt $TARGETSIZE ]; then
			dialog --title "BackupHD" --msgbox "Target volume is too small.\nTotal size required  [$TOTALSIZE]\nTotal size available [$TARGETSIZE]\n" 10 60
			RES=99
		else
			RES=0
		fi
	fi 
	
	# Check that $RES = 0 so we can continue...
	# Otherwise quit this routine.
	if [ $RES -ne 0 ]; then
		break
	fi

	if [ -z $PREFIX ]; then
		# Default prefix to "YYYY-MM-DD-HHMM_"
		PREFIX=$(date +'%F-%H%M')_
	fi

	RECOVERY=$TMP_TDR/recovery-procedure

        rm -f $RECOVERY
	if [ $RES -eq 0 ]; then
		# make restore procedure script
		touch $RECOVERY
		chmod +x $RECOVERY
		echo '#!/bin/bash' >> $RECOVERY
		echo 'unset SSH' >> $RECOVERY
		echo '# -- ' >> $RECOVERY
		echo '# ' >> $RECOVERY
		echo '# --' >> $RECOVERY
		echo 'RESTORE_DIR=$(dirname "$0")' >> $RECOVERY
		echo 'PREFIX='${PREFIX} >> $RECOVERY
		echo 'mkdir -p /tmp/TDR-recover' >> $RECOVERY
		echo 'tar xf ${RESTORE_DIR}/${PREFIX}system-data.tar -C /tmp/TDR-recover' >> $RECOVERY
		mkdir -p $TMP_TDR/hd
		# -- Exclusion list
		for HD in $HD_EXCLUDE 
		do 
			mkdir -p $(dirname $TMP_TDR/hd/$HD)  # Account for device names like /dev/cciss/c0d0p1
			touch $TMP_TDR/hd/$HD 
		done
		# - restore boot block and partition table
		for HD in $(dmesg | grep -P "^\s+\S+:\s+\S+\d+" | grep -P "(\d+|>)$" | cut -d':' -f1 | sed 's/ //g')
		do 
			if [ ! -f $TMP_TDR/hd/$HD ]; then
				mkdir -p $(dirname $TMP_TDR/hd/$HD)  # Account for device names like /dev/cciss/c0d0p1
				# restore boot block
				echo "dd if=/tmp/TDR-recover/hd/$HD.partinfo bs=512 count=63 of=/dev/$HD" >> $RECOVERY
				# restore partition table
				echo "sfdisk /dev/$HD < /tmp/TDR-recover/hd/$HD.sfdisk" >> $RECOVERY
				touch $TMP_TDR/hd/$HD
			fi
		done
		echo "echo \"#--- Sleep for a while to let slow controllers (HP/Compaq RAID's for one) catch up...\"" >> $RECOVERY
		echo "sleep 10" >> $RECOVERY
		rm -rf $TMP_TDR/hd/
		# -- Exclusion HD list
		for HD in $HD_EXCLUDE 
		do 
			mkdir -p $(dirname $TMP_TDR/hd/$HD)  # Account for device names like /dev/cciss/c0d0p1
			touch $TMP_TDR/hd/$HD 
		done
		# - recreate partitions (including swap), restore data, re-install grub
                for HD in `dmesg |grep -P "^\s+\S+:\s+\S+\d+"|grep -P "(\d+|\>)$"|cut -d':' -f1|sed 's/ //g'`
		do 
			unset FILE
			if [ ! -f $TMP_TDR/hd/$HD ]; then
				mkdir -p $(dirname $TMP_TDR/hd/$HD)  # Account for device names like /dev/cciss/c0d0p1
				touch $TMP_TDR/hd/$HD
				for PART in $(sfdisk -l /dev/$HD | grep -P "Linux$" | cut -d' ' -f1 )
				do 
					# Create partition restore procedure
					LABEL=$(e2label $PART)
					PART_BASE=$(basename $PART)
                                        echo "echo \"# === $LABEL on $PART ===\"" >> $RECOVERY
					echo "mke2fs -j -L $LABEL $PART" >> $RECOVERY
					echo "mkdir -p /mnt/$PART_BASE" >> $RECOVERY
					echo "mount $PART /mnt/$PART_BASE" >> $RECOVERY
					echo "cd /mnt/$PART_BASE" >> $RECOVERY
					echo "rm -rf *" >> $RECOVERY
					FILE="\${RESTORE_DIR}/${PREFIX}${PART_BASE}.img"
					echo "echo \"# --- Restoring $LABEL from $FILE --- \"" >> $RECOVERY
# TODO: RSH=ssh RMT=rmt restore -r ${REMOTE_TAPE}
					echo "restore -v -M -rf $FILE" >> $RECOVERY
					echo "rm -f restoresymtable" >> $RECOVERY
					echo "cd /" >> $RECOVERY
					echo "umount /mnt/$PART_BASE" >> $RECOVERY
					if [ "$LABEL" = "/boot" ]; then
						echo "echo Restoring GRUB bootloader" >> $RECOVERY
						echo "mkdir -p /mnt/$PART_BASE/boot" >> $RECOVERY
						echo "mount $PART /mnt/$PART_BASE/boot" >> $RECOVERY
						echo "grub-install --no-floppy --recheck --root-directory=/mnt/$PART_BASE /dev/$HD" >> $RECOVERY
						echo "umount /mnt/$PART_BASE/boot" >> $RECOVERY
					fi
					echo "" >> $RECOVERY
				done
				# Recreate the swap partition
				for PART in $( sfdisk -l /dev/$HD|grep -P "Linux swap$"|cut -d' ' -f1 )
				do
					echo "mkswap $PART" >> $RECOVERY
                                        echo "" >> $RECOVERY
				done
			fi
		done
		rm -rf $TMP_TDR/hd/
		
		# Now to actually do the backup
		
		# -- backup recovery-procedure script 
		rm -f $TDR_ROOT/${PREFIX}system-data.tar
		tar cf $TDR_ROOT/${PREFIX}system-data.tar -C $TMP_TDR recovery-procedure 
		cp -v $RECOVERY $TDR_ROOT/${PREFIX}recovery-procedure

		# -- Exclusion HD list
		for HD in $HD_EXCLUDE 
		do 
			mkdir -p $(dirname $TMP_TDR/hd/$HD)  # Account for device names like /dev/cciss/c0d0p1
			touch $TMP_TDR/hd/$HD 
		done
		# -- backup partition table information
		for HD in `dmesg |grep -P "^\s+\S+:\s+\S+\d+"|grep -P "(\d+|\>)$"|cut -d':' -f1|sed 's/ //g'`
		do
			if [ ! -f $TMP_TDR/hd/$HD ]; then
				mkdir -p $(dirname $TMP_TDR/hd/$HD) # Account for device names like /dev/cciss/c0d0p1
				dd if=/dev/$HD of=$TMP_TDR/hd/$HD.partinfo bs=512 count=63
				sfdisk -d /dev/$HD > $TMP_TDR/hd/$HD.sfdisk  
 				tar --append -f $TDR_ROOT/${PREFIX}system-data.tar -C $TMP_TDR hd/$HD.partinfo hd/$HD.sfdisk
				touch $TMP_TDR/hd/$HD
			fi
		done
		rm -rf $TMP_TDR/hd/
		# -- Exclusion HD list
		for HD in $HD_EXCLUDE 
		do 
			mkdir -p $(dirname $TMP_TDR/hd/$HD) # Account for device names like /dev/cciss/c0d0p1
			touch $TMP_TDR/hd/$HD 
		done
		# -- backup data for each partition 
		for HD in $(dmesg |grep -P "^\s+\S+:\s+\S+\d+"|grep -P "(\d+|\>)$"|cut -d':' -f1|sed 's/ //g')
		do
			unset FILE
			if [ ! -f $TMP_TDR/hd/$HD ]; then
				mkdir -p $(dirname $TMP_TDR/hd/$HD) # Account for device names like /dev/cciss/c0d0p1
				touch $TMP_TDR/hd/$HD
				for PART in $(sfdisk -l /dev/$HD|grep -P "Linux$"|cut -d' ' -f1)
				do
					# dump to file -- remote could be set in the $TDR_ROOT variable....
					PART_BASE=$(basename $PART)
					FILE=${REMOTE}${TDR_ROOT}/${PREFIX}${PART_BASE}.img
					echo "Dumping $PART_BASE to $FILE ..." 
					# -B 4589824 => (4589824 x 1024 = 4699979776 bytes) or DVD size chunk
					# -B 665600  => ( 665600 x 1024 =  681574400 bytes) or CD size chunks
					# dump $DUMP_OPT -M -B 4589824 -0 $PART -j9 -f $FILE
					dump $DUMP_OPT -M -B 665600 -b 10 -0 $PART -j9 -f $FILE
				done
			fi
		done
		rm -rf $TMP_TDR/hd/

#TODO: Package the resulting files into one (or more chunks) ?		
		
		rm -Rf $TMP_TDR 
		if [ ${AUTO} -eq 0 ]; then
			dialog --no-kill --msgbox "[Backup]\nBackup is done!" 6 40
		fi
		
		echo "It is safe to reboot now"
	elif [ $RES -eq 1 ]; then
		dialog --no-kill --msgbox "[Backup]\nThis computer encountered an error\n Try another method\n" 7 50
	fi

}

JaredBusch

Well dd is moving right along.

I had to use their recovery CD to boot the hardware. It would not boot to any of my USB drives.

So that is dd from RHEL 4. The USB disk it is writing to is formatted FAT. So a direct write puked at 4GB.

The version of split on there only supports a size tag of m at the largest. So I went with 650MB on the split to match what their normal process creates.

JaredBusch

I'm monitoring the progress in console 2 (ctl+alt+f2) with
watch -n 1 "ls -lash /dd_manual/dd"

JaredBusch

Process completed with no errors yesterday.

Now to merge it all back together and try to restore it to a VM.

1337

It feels like I'm watching reality TV.

Dashrender

@JaredBusch said in RHEL 4 not seeing ext3 label:

Process completed with no errors yesterday.

Now to merge it all back together and try to restore it to a VM.

Do you need to merge it? just wondering?

JaredBusch

@Dashrender said in RHEL 4 not seeing ext3 label:

Do you need to merge it? just wondering?

How else does it become a single disk image file to import into my hypervisor?

JaredBusch

So back home, and I have the files backed up in like 4 places.

I recombined the .img files and then unzipped them.

Getting ready to setup a new VM on Proxmox, but I poked around dmesg on the running system first.

SCSI subsystem initialized
Fusion MPT base driver 3.02.73rh
Copyright (c) 1999-2006 LSI Logic Corporation
Fusion MPT SPI Host driver 3.02.73rh
ACPI: PCI Interrupt 0000:02:05.0[A] -> GSI 34 (level, low) -> IRQ 201
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator,Target}
scsi0 : ioc0: LSI53C1030, FwRev=01032300h, Ports=1, MaxQ=255, IRQ=201
ACPI: PCI Interrupt 0000:02:05.1[B] -> GSI 33 (level, low) -> IRQ 209
mptbase: Initiating ioc1 bringup
ioc1: 53C1030: Capabilities={Initiator,Target}
scsi1 : ioc1: LSI53C1030, FwRev=01032300h, Ports=1, MaxQ=255, IRQ=209
Fusion MPT SAS Host driver 3.02.73rh
megaraid cmm: 2.20.2.6rh (Release Date: Tue Jan 16 12:35:06 PST 2007)
megaraid: 2.20.4.6-rh2 (Release Date: Wed Jun 28 12:27:22 EST 2006)
megaraid: probe new device 0x1000:0x1960:0x1028:0x0518: bus 9:slot 4:func 0
ACPI: PCI Interrupt 0000:09:04.0[A] -> GSI 106 (level, low) -> IRQ 233
megaraid: fw version:[351S] bios version:[1.10]
scsi2 : LSI Logic MegaRAID driver
scsi[2]: scanning scsi channel 0 [Phy 0] for non-raid devices
  Vendor: PE/PV     Model: 1x6 SCSI BP       Rev: 1.0 
  Type:   Processor                          ANSI SCSI revision: 02
scsi[2]: scanning scsi channel 1 [Phy 1] for non-raid devices
scsi[2]: scanning scsi channel 2 [virtual] for logical drives
  Vendor: MegaRAID  Model: LD 0 RAID1   69G  Rev: 351S
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sda: 143114240 512-byte hdwr sectors (73274 MB)
sda: asking for cache data failed
sda: assuming drive cache: write through
SCSI device sda: 143114240 512-byte hdwr sectors (73274 MB)
sda: asking for cache data failed
sda: assuming drive cache: write through
 sda: sda1 sda2 sda3
Attached scsi disk sda at scsi2, channel 2, id 0, lun 0
  Vendor: MegaRAID  Model: LD 1 RAID5  139G  Rev: 351S
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sdb: 286228480 512-byte hdwr sectors (146549 MB)
sdb: asking for cache data failed
sdb: assuming drive cache: write through
SCSI device sdb: 286228480 512-byte hdwr sectors (146549 MB)
sdb: asking for cache data failed
sdb: assuming drive cache: write through
 sdb: sdb1
Attached scsi disk sdb at scsi2, channel 2, id 1, lun 0

I think this tells me that I should try the megaRAID controller this time. I swaer I already tried. But I have slept since then. Tuesday and Wednesday were crazy stressed getting data..

JaredBusch

Well damnit. It does not see the second disk..

Looks like an error during boot

Dashrender

can you boot from a live image and see both disks?

I did a d2vm of a windows 2003 server and I had to run checkdisk like 10 times before it finally worked.. don't ask my why I tried it so many times... I think there is a thread around here somewhere about it.

JaredBusch

@Dashrender said in RHEL 4 not seeing ext3 label:

can you boot from a live image and see both disks?

I did a d2vm of a windows 2003 server and I had to run checkdisk like 10 times before it finally worked.. don't ask my why I tried it so many times... I think there is a thread around here somewhere about it.

The restored drives are fine. Can be mounted as previously noted and the label reports correctly.

The issue seems to be that the kernel, as built, is not loading the drives correctly. Potentially because the VM is using a SCSI driver method the old ass kernel does not understand.

jt1001001

Didn't Dell "back in the day" use or require their own megaraid driver's on Linux?? Can't remember as its been ages since I delt with a 28XX series with a PERC raid card.

JaredBusch

Using VirtIO SCSI (the default selection) the drives are not even seen by tthe recovery boot image. The onyl thing shown is the USB drive holding the data to restore.

Solved RHEL 4 not seeing ext3 label