Saving a dying server
-
Let me just preface that this was inherited and steps are being taken now to move to a more reliable system setup. Need to work on resolution rather than what someone should have done before. (backups, monitoring, etc.)
Single drive, dedicated remote host @ Fasthosts, with a failing drive with no backups. Drive is using lvm.
What is the best way to try and salvage the data to move it on another server.?
I will update in a sec with more details on the setup.
-
First thing is always... take a backup. Worry about everything else after that.
-
yes - that is happening now. the problem at the moment is the speed of the back up because of the errors we're getting on the drive. the backup is moving along at a snail pace apparently. I can't even type in a command to terminal without a huge lag.
% df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 913G 144G 724G 17% / udev 3.9G 4.0K 3.9G 1% /dev tmpfs 786M 384K 786M 1% /run none 5.0M 0 5.0M 0% /run/lock none 3.9G 0 3.9G 0% /run/shm /dev/sda1 236M 56M 168M 25% /boot # lvdisplay --- Logical volume --- LV Name /dev/VolGroup00/LogVol00 VG Name VolGroup00 LV UUID 1cohna-I14w-vCBO-yHV8-qMtP-vN6a-UTFqTG LV Write Access read/write LV Status available # open 1 LV Size 927.46 GiB Current LE 237429 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:0
-
they are trying to run the backup with all the services still running - it's a billing/invoicing server and statements are going out and being processed today. I've said we should turn off services to run the backup and migration
-
@larsen161 said:
they are trying to run the backup with all the services still running - it's a billing/invoicing server and statements are going out and being processed today. I've said we should turn off services to run the backup and migration
yeah, that backup is not likely to get the databases if they are just doing a generic backup.
-
@larsen161 said:
yes - that is happening now. the problem at the moment is the speed of the back up because of the errors we're getting on the drive. the backup is moving along at a snail pace apparently. I can't even type in a command to terminal without a huge lag.
The only upside or silver lining here is what we always say... since there are no backups and no RAID, whoever put this system in and everyone since then has determined that this data has zero value and falls below the home line so... there isn't any business risk according to everyone involved so far. If there is the slightest concern about the data on this machine then that's not your problem. Someone else determined that this had no value and it sounds like everyone agreed. Whoever decided to put the database on here and use it for some business purpose is the one that needs to be sweating.
For your part, just waiting for the backup is all that you can do. And if people are determined to keep using it today, I would inform them of the real situation and tell them that running reports on it right now could VERY likely be the same as deciding to set the machine on fire and let the data die with it. Put the onus on them to decide if they feel it has value now or not. Using it in this state is the end users telling you that the system's data doesn't matter. Make 100% that they know that this is what their actions mean.
-
Might be worth using a direct P2V tool to do the backup in this case. If this was my situation, this is likely what I would do:
- Tell the users that between their decisions and whoever installed this decisions, the system cannot be used now and they don't have any vote in this whatsoever. Tell them the way things have to be, don't let them throw away the company's data because they will just blame you for this later.
- Determine where data is stored. Likely only in MySQL. Use MySQL's own tools and do a database dump ASAP.
- Shut down the databases and all applications.
- Do a direct P2V to a production platform.
-
-
Whoa
-
@johnhooks that's nothing - it was twice that yesterday
-
@johnhooks said:
Whoa
Pretty sure this disk is right proper hosed. Either it has surface damage or the surface is de-laminating - something really really bad.
-
Ok so this is a serious question. I've never dealt with dedicated hosting somewhere else. Do they not maintain this stuff? So you really are just paying them for electricity and internet?
-
@johnhooks said:
Ok so this is a serious question. I've never dealt with dedicated hosting somewhere else. Do they not maintain this stuff? So you really are just paying them for electricity and internet?
That's what a hosting facility does. They provide the electric, internet and HVAC. You are in charge of everything else.
-
@scottalanmiller said:
@johnhooks said:
Ok so this is a serious question. I've never dealt with dedicated hosting somewhere else. Do they not maintain this stuff? So you really are just paying them for electricity and internet?
That's what a hosting facility does. They provide the electric, internet and HVAC. You are in charge of everything else.
I thought you could have a colo do that type of stuff for you? If not, what's the advantage to paying for dedicated servers that you can't access over a colo?
-
@johnhooks said:
Whoa
That's, um, high.
-
@johnhooks said:
@scottalanmiller said:
@johnhooks said:
Ok so this is a serious question. I've never dealt with dedicated hosting somewhere else. Do they not maintain this stuff? So you really are just paying them for electricity and internet?
That's what a hosting facility does. They provide the electric, internet and HVAC. You are in charge of everything else.
I thought you could have a colo do that type of stuff for you? If not, what's the advantage to paying for dedicated servers that you can't access over a colo?
uh - yeah I agree with JH here - hosting to me I think of VM's on your platform... Colo means my equipment I'm responsible.
-
If you are renting a dedicated server from a facility, you should be able to call their support and tell them what is going on, so they can replace the faulty drive for you (after you have good backups, of course!)
-
@dafyre said:
If you are renting a dedicated server from a facility, you should be able to call their support and tell them what is going on, so they can replace the faulty drive for you (after you have good backups, of course!)
They don't have RAID, though. The colo should do that... but you'd be left with a dead system. I'm guessing no IPMI system either, if they didn't even bother with RAID.
-
@scottalanmiller said:
@dafyre said:
If you are renting a dedicated server from a facility, you should be able to call their support and tell them what is going on, so they can replace the faulty drive for you (after you have good backups, of course!)
They don't have RAID, though. The colo should do that... but you'd be left with a dead system. I'm guessing no IPMI system either, if they didn't even bother with RAID.
Take for instance, the server that I have with KimSufi... I don't have raid in that box. If the HD dies, then whoops!
They replace the hard drive, and I re-image through their web portal and restore my data from backups.
-
@johnhooks said:
Whoa
If bet if you check using top or glances, you'll see the IO Wait % is very high.