HP ILO Fails with Bad RAM
-
So, this morning 4 VMs are down and not responding. Tell them to start and ppfph. Connect to the ILO and ppfph. I had to power down the box and pull the plugs for a minute before it would start correctly. It turns out that I have a bad stick of RAM on CPU2 -- which prevented me from ILO access. Aaaaaaaahhh!!!
So on HPE G10s, a bad stick of RAM on CPU 2 interrupts the ILO. I would not have guessed. I can get into it now after a hard boot. Friggin' BS. -
@scotth said in Proxmox in 2022:
So, this morning 4 VMs are down and not responding. Tell them to start and ppfph. Connect to the ILO and ppfph. I had to power down the box and pull the plugs for a minute before it would start correctly. It turns out that I have a bad stick of RAM on CPU2 -- which prevented me from ILO access. Aaaaaaaahhh!!!
So on HPE G10s, a bad stick of RAM on CPU 2 interrupts the ILO. I would not have guessed. I can get into it now after a hard boot. Friggin' BS.Sorry, wrong post -- VMWare, not Proxmox. Feel free to move, delete, ....
-
@scotth said in HP ILO Fails with Bad RAM:
So on HPE G10s, a bad stick of RAM on CPU 2 interrupts the ILO. I would not have guessed. I can get into it now after a hard boot. Friggin' BS.
What model HPE was this?
-
@scotth said in HP ILO Fails with Bad RAM:
@scotth said in Proxmox in 2022:
So, this morning 4 VMs are down and not responding. Tell them to start and ppfph. Connect to the ILO and ppfph. I had to power down the box and pull the plugs for a minute before it would start correctly. It turns out that I have a bad stick of RAM on CPU2 -- which prevented me from ILO access. Aaaaaaaahhh!!!
So on HPE G10s, a bad stick of RAM on CPU 2 interrupts the ILO. I would not have guessed. I can get into it now after a hard boot. Friggin' BS.Sorry, wrong post -- VMWare, not Proxmox. Feel free to move, delete, ....
It's neither. It's about HPE speciifc hardware.
-
@scottalanmiller said in HP ILO Fails with Bad RAM:
@scotth said in HP ILO Fails with Bad RAM:
@scotth said in Proxmox in 2022:
So, this morning 4 VMs are down and not responding. Tell them to start and ppfph. Connect to the ILO and ppfph. I had to power down the box and pull the plugs for a minute before it would start correctly. It turns out that I have a bad stick of RAM on CPU2 -- which prevented me from ILO access. Aaaaaaaahhh!!!
So on HPE G10s, a bad stick of RAM on CPU 2 interrupts the ILO. I would not have guessed. I can get into it now after a hard boot. Friggin' BS.Sorry, wrong post -- VMWare, not Proxmox. Feel free to move, delete, ....
It's neither. It's about HPE speciifc hardware.
DL360 G10
CPU2 DIMM3
-
@scotth said in HP ILO Fails with Bad RAM:
@scottalanmiller said in HP ILO Fails with Bad RAM:
@scotth said in HP ILO Fails with Bad RAM:
@scotth said in Proxmox in 2022:
So, this morning 4 VMs are down and not responding. Tell them to start and ppfph. Connect to the ILO and ppfph. I had to power down the box and pull the plugs for a minute before it would start correctly. It turns out that I have a bad stick of RAM on CPU2 -- which prevented me from ILO access. Aaaaaaaahhh!!!
So on HPE G10s, a bad stick of RAM on CPU 2 interrupts the ILO. I would not have guessed. I can get into it now after a hard boot. Friggin' BS.Sorry, wrong post -- VMWare, not Proxmox. Feel free to move, delete, ....
It's neither. It's about HPE speciifc hardware.
DL360 G10
CPU2 DIMM3
I'm surprised that the ILO depends on the main system RAM on that unit. The ILO used to be completely independent.
-
This is good to know about we deal almost exclusively with HPE. Another possible reason to drop them as a vendor if this turns out to be the case.
-
Oddly enough, after pulling off the cover, looking at the diagram, CPU2 is on the left and RAM slot 3 is the 1st slot on the left of the board. So, in order for the ILO to function, you need viable RAM in the 1st slot on the board. In this device, that's CPU2 slot 3.
The error in ILO was -- Uncorrectable Machine Check Exception (Processor 2 APIC ID .....
Then -- DIMM Failure - Uncorrectable Memory Error (Processor 2, DIMM 3)
Further in the logs -- Uncorrectable Memory Error Threshold Exceeded (Processor 2, DIMM 3). The DIMM is mapped out and is currently not available.
Finally after I pulled the plugs out of the power supplies, and a hard reset -- The installed number of DIMMs on one or more processors results in an unbalanced memory configuration across memory controllers. This may result in non-optimal memory performance.
The server was completely unresponsive. I've never had the ILO go down like this. Fortunately, this was onsite here. It did come back up and ran normally until maintenance showed up with a new stick of RAM. Also, there's two different types of RAM for these units. You need to grab the part number to get the correct replacement.
Like Scott said, ILO was separate from the system -- apparently in the past, this was true. No longer.
-
@scotth yeah, in the past (maybe the long past?) the iLO was a self contained board with its own CPU and RAM!