Windows Server 2003 Cluster Dead

scottalanmiller

Obviously once the Q disk was missing, it could not join the cluster.

scottalanmiller

Doing a controller power cycle now. Bringing down the physical cluster now, then the DAS. Then going to power on the DAS, give it time, and bring the nodes up. Expect very little, but it is a place to start.

scottalanmiller

DAS is powering up, lights on it do not look good.

scottalanmiller

10 drives in the array, believed to be RAID 10. 2 drives in RAID 1 as well.

scottalanmiller

One drive in the large array is flashing orange, so looks like one drive has failed.

All other drives are green.

scottalanmiller

Bringing up Node 1 again now. With only one drive failed in the DAS unit, any RAID (other than RAID 0) should have survived.

scottalanmiller

Okay, that process brought things up. Not the cluster, but the disks are back. We can see the Quorum plus other disks now.

scottalanmiller

Trying to bring up Node 2 now, but I'm not hopeful on that.

scottalanmiller

Node 1 is healthy, Node 2 is gone. Cluster won't come up, but the workloads did. So they are good for now.

Danp

Do they have a plan to replace this outdated tech with something current?

scottalanmiller

@Danp said in Windows Server 2003 Cluster Dead:

Do they have a plan to replace this outdated tech with something current?

Yes, there was a six month plan in place already, but it just got moved to something like a six day plan.

FATeknollogee

@scottalanmiller said in Windows Server 2003 Cluster Dead:

@Danp said in Windows Server 2003 Cluster Dead:

Do they have a plan to replace this outdated tech with something current?

Yes, there was a six month plan in place already, but it just got moved to something like a six day plan.

Love it when that happens!!

DustinB3403

@FATeknollogee said in Windows Server 2003 Cluster Dead:

@scottalanmiller said in Windows Server 2003 Cluster Dead:

@Danp said in Windows Server 2003 Cluster Dead:

Do they have a plan to replace this outdated tech with something current?

Yes, there was a six month plan in place already, but it just got moved to something like a six day plan.

Love it when that happens!!

The system is still f***** because they have to replace it today and they have to worry about good backups today.

2003 is ancient

Obsolesce

I'm guessing the thing hasn't been maintained at all which would have brought this about sooner but in a controlled manner.

scottalanmiller

@DustinB3403 said in Windows Server 2003 Cluster Dead:

@FATeknollogee said in Windows Server 2003 Cluster Dead:

@scottalanmiller said in Windows Server 2003 Cluster Dead:

@Danp said in Windows Server 2003 Cluster Dead:

Do they have a plan to replace this outdated tech with something current?

Yes, there was a six month plan in place already, but it just got moved to something like a six day plan.

Love it when that happens!!

The system is still f***** because they have to replace it today and they have to worry about good backups today.

2003 is ancient

Backups are running today.

scottalanmiller

@Obsolesce said in Windows Server 2003 Cluster Dead:

I'm guessing the thing hasn't been maintained at all which would have brought this about sooner but in a controlled manner.

Pretty much. We weren't even told about it. Not that we needed to be, we consult for this customer, we aren't their outsourced IT.

nadnerB

well, that's a Fuster Cluck and a half

travisdh1

And that's why we call them IPOD. (Inverted Pyramid of Doom). Welcome them to the club, hopefully doing it correctly this time!

scottalanmiller

@travisdh1 said in Windows Server 2003 Cluster Dead:

And that's why we call them IPOD. (Inverted Pyramid of Doom). Welcome them to the club, hopefully doing it correctly this time!

Yeah, we mentioned that on the call. But it predated the people who were there now (it even predated their CAREERS!) It's such an old system. When a system is 16 years old, it's actually not that common to find people who were actively working in IT at that time. If you assume most people don't start IT until the age that they would have finished college, that's 23. Add sixteen career years, that's 39. Add a year for planning of the project before it was purchased, and you are age 40. So only people likely to be 40+, who started in IT right away and didn't move from another career, could be reasonably expected to have been in the field at the time that the system was decided on! That's nuts.

scottalanmiller

This really is a good example of why the IPOD is so bad. The "never fails" DAS failed, but at least it didn't lose the data, it just caused a large panic outage.

But there are three servers, instead of one. And two of them failed. One completely (node 2), and one partially (the DAS.) Had only Node 1 been purchased, they would have had no outage, no failures, and made it sixteen years at about one quarter the cost, and never seen an outage at all.

The "just buy one server" here would have kicked the crap out of the reliability of the IPOD! No redundancy on this system was ever used, but because it had that redundancy, it caused things to fail that should not have.