Saving a dying server

larsen161

Let me just preface that this was inherited and steps are being taken now to move to a more reliable system setup. Need to work on resolution rather than what someone should have done before. (backups, monitoring, etc.)

Single drive, dedicated remote host @ Fasthosts, with a failing drive with no backups. Drive is using lvm.

What is the best way to try and salvage the data to move it on another server.?

I will update in a sec with more details on the setup.

scottalanmiller

First thing is always... take a backup. Worry about everything else after that.

larsen161

yes - that is happening now. the problem at the moment is the speed of the back up because of the errors we're getting on the drive. the backup is moving along at a snail pace apparently. I can't even type in a command to terminal without a huge lag.

0_1457518883301_Screen Shot 2016-03-09.png

% df -h
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00  913G  144G  724G  17% /
udev                             3.9G  4.0K  3.9G   1% /dev
tmpfs                            786M  384K  786M   1% /run
none                             5.0M     0  5.0M   0% /run/lock
none                             3.9G     0  3.9G   0% /run/shm
/dev/sda1                        236M   56M  168M  25% /boot

# lvdisplay
  --- Logical volume ---
  LV Name                /dev/VolGroup00/LogVol00
  VG Name                VolGroup00
  LV UUID                1cohna-I14w-vCBO-yHV8-qMtP-vN6a-UTFqTG
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                927.46 GiB
  Current LE             237429
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:0

0_1457519054420_Screen Shot 2016-03-09 at 10.23.57.png

larsen161

they are trying to run the backup with all the services still running - it's a billing/invoicing server and statements are going out and being processed today. I've said we should turn off services to run the backup and migration

scottalanmiller

@larsen161 said:

they are trying to run the backup with all the services still running - it's a billing/invoicing server and statements are going out and being processed today. I've said we should turn off services to run the backup and migration

yeah, that backup is not likely to get the databases if they are just doing a generic backup.

scottalanmiller

@larsen161 said:

yes - that is happening now. the problem at the moment is the speed of the back up because of the errors we're getting on the drive. the backup is moving along at a snail pace apparently. I can't even type in a command to terminal without a huge lag.

The only upside or silver lining here is what we always say... since there are no backups and no RAID, whoever put this system in and everyone since then has determined that this data has zero value and falls below the home line so... there isn't any business risk according to everyone involved so far. If there is the slightest concern about the data on this machine then that's not your problem. Someone else determined that this had no value and it sounds like everyone agreed. Whoever decided to put the database on here and use it for some business purpose is the one that needs to be sweating.

For your part, just waiting for the backup is all that you can do. And if people are determined to keep using it today, I would inform them of the real situation and tell them that running reports on it right now could VERY likely be the same as deciding to set the machine on fire and let the data die with it. Put the onus on them to decide if they feel it has value now or not. Using it in this state is the end users telling you that the system's data doesn't matter. Make 100% that they know that this is what their actions mean.

scottalanmiller

Might be worth using a direct P2V tool to do the backup in this case. If this was my situation, this is likely what I would do:

Tell the users that between their decisions and whoever installed this decisions, the system cannot be used now and they don't have any vote in this whatsoever. Tell them the way things have to be, don't let them throw away the company's data because they will just blame you for this later.
Determine where data is stored. Likely only in MySQL. Use MySQL's own tools and do a database dump ASAP.
Shut down the databases and all applications.
Do a direct P2V to a production platform.

MattSpeller

@larsen161 said:

Oh my.

stacksofplates

Whoa

larsen161

@johnhooks that's nothing - it was twice that yesterday

MattSpeller

@johnhooks said:

Whoa

Pretty sure this disk is right proper hosed. Either it has surface damage or the surface is de-laminating - something really really bad.

stacksofplates

Ok so this is a serious question. I've never dealt with dedicated hosting somewhere else. Do they not maintain this stuff? So you really are just paying them for electricity and internet?

scottalanmiller

@johnhooks said:

Ok so this is a serious question. I've never dealt with dedicated hosting somewhere else. Do they not maintain this stuff? So you really are just paying them for electricity and internet?

That's what a hosting facility does. They provide the electric, internet and HVAC. You are in charge of everything else.

stacksofplates

@scottalanmiller said:

@johnhooks said:

Ok so this is a serious question. I've never dealt with dedicated hosting somewhere else. Do they not maintain this stuff? So you really are just paying them for electricity and internet?

That's what a hosting facility does. They provide the electric, internet and HVAC. You are in charge of everything else.

I thought you could have a colo do that type of stuff for you? If not, what's the advantage to paying for dedicated servers that you can't access over a colo?

StrongBad

@johnhooks said:

Whoa

That's, um, high.

Dashrender

@johnhooks said:

@scottalanmiller said:

@johnhooks said:

Ok so this is a serious question. I've never dealt with dedicated hosting somewhere else. Do they not maintain this stuff? So you really are just paying them for electricity and internet?

That's what a hosting facility does. They provide the electric, internet and HVAC. You are in charge of everything else.

I thought you could have a colo do that type of stuff for you? If not, what's the advantage to paying for dedicated servers that you can't access over a colo?

uh - yeah I agree with JH here - hosting to me I think of VM's on your platform... Colo means my equipment I'm responsible.

dafyre

If you are renting a dedicated server from a facility, you should be able to call their support and tell them what is going on, so they can replace the faulty drive for you (after you have good backups, of course!)

scottalanmiller

@dafyre said:

If you are renting a dedicated server from a facility, you should be able to call their support and tell them what is going on, so they can replace the faulty drive for you (after you have good backups, of course!)

They don't have RAID, though. The colo should do that... but you'd be left with a dead system. I'm guessing no IPMI system either, if they didn't even bother with RAID.

dafyre

@scottalanmiller said:

@dafyre said:

If you are renting a dedicated server from a facility, you should be able to call their support and tell them what is going on, so they can replace the faulty drive for you (after you have good backups, of course!)

They don't have RAID, though. The colo should do that... but you'd be left with a dead system. I'm guessing no IPMI system either, if they didn't even bother with RAID.

Take for instance, the server that I have with KimSufi... I don't have raid in that box. If the HD dies, then whoops!

They replace the hard drive, and I re-image through their web portal and restore my data from backups.

dafyre

@johnhooks said:

Whoa

If bet if you check using top or glances, you'll see the IO Wait % is very high.