Saving a dying server



  • Let me just preface that this was inherited and steps are being taken now to move to a more reliable system setup. Need to work on resolution rather than what someone should have done before. (backups, monitoring, etc.)

    Single drive, dedicated remote host @ Fasthosts, with a failing drive with no backups. Drive is using lvm.

    What is the best way to try and salvage the data to move it on another server.?

    I will update in a sec with more details on the setup.


  • Service Provider

    First thing is always... take a backup. Worry about everything else after that.



  • yes - that is happening now. the problem at the moment is the speed of the back up because of the errors we're getting on the drive. the backup is moving along at a snail pace apparently. I can't even type in a command to terminal without a huge lag.

    0_1457518883301_Screen Shot 2016-03-09.png

    % df -h
    Filesystem                       Size  Used Avail Use% Mounted on
    /dev/mapper/VolGroup00-LogVol00  913G  144G  724G  17% /
    udev                             3.9G  4.0K  3.9G   1% /dev
    tmpfs                            786M  384K  786M   1% /run
    none                             5.0M     0  5.0M   0% /run/lock
    none                             3.9G     0  3.9G   0% /run/shm
    /dev/sda1                        236M   56M  168M  25% /boot
    
    # lvdisplay
      --- Logical volume ---
      LV Name                /dev/VolGroup00/LogVol00
      VG Name                VolGroup00
      LV UUID                1cohna-I14w-vCBO-yHV8-qMtP-vN6a-UTFqTG
      LV Write Access        read/write
      LV Status              available
      # open                 1
      LV Size                927.46 GiB
      Current LE             237429
      Segments               1
      Allocation             inherit
      Read ahead sectors     auto
      - currently set to     256
      Block device           252:0
    

    0_1457519054420_Screen Shot 2016-03-09 at 10.23.57.png



  • they are trying to run the backup with all the services still running - it's a billing/invoicing server and statements are going out and being processed today. I've said we should turn off services to run the backup and migration


  • Service Provider

    @larsen161 said:

    they are trying to run the backup with all the services still running - it's a billing/invoicing server and statements are going out and being processed today. I've said we should turn off services to run the backup and migration

    yeah, that backup is not likely to get the databases if they are just doing a generic backup.


  • Service Provider

    @larsen161 said:

    yes - that is happening now. the problem at the moment is the speed of the back up because of the errors we're getting on the drive. the backup is moving along at a snail pace apparently. I can't even type in a command to terminal without a huge lag.

    The only upside or silver lining here is what we always say... since there are no backups and no RAID, whoever put this system in and everyone since then has determined that this data has zero value and falls below the home line so... there isn't any business risk according to everyone involved so far. If there is the slightest concern about the data on this machine then that's not your problem. Someone else determined that this had no value and it sounds like everyone agreed. Whoever decided to put the database on here and use it for some business purpose is the one that needs to be sweating.

    For your part, just waiting for the backup is all that you can do. And if people are determined to keep using it today, I would inform them of the real situation and tell them that running reports on it right now could VERY likely be the same as deciding to set the machine on fire and let the data die with it. Put the onus on them to decide if they feel it has value now or not. Using it in this state is the end users telling you that the system's data doesn't matter. Make 100% that they know that this is what their actions mean.


  • Service Provider

    Might be worth using a direct P2V tool to do the backup in this case. If this was my situation, this is likely what I would do:

    1. Tell the users that between their decisions and whoever installed this decisions, the system cannot be used now and they don't have any vote in this whatsoever. Tell them the way things have to be, don't let them throw away the company's data because they will just blame you for this later.
    2. Determine where data is stored. Likely only in MySQL. Use MySQL's own tools and do a database dump ASAP.
    3. Shut down the databases and all applications.
    4. Do a direct P2V to a production platform.


  • @larsen161 said:

    0_1457518883301_Screen Shot 2016-03-09.png

    Oh my.



  • Whoa

    0_1457543657784_loadavg.png



  • @johnhooks that's nothing - it was twice that yesterday



  • @johnhooks said:

    Whoa

    0_1457543657784_loadavg.png

    Pretty sure this disk is right proper hosed. Either it has surface damage or the surface is de-laminating - something really really bad.



  • Ok so this is a serious question. I've never dealt with dedicated hosting somewhere else. Do they not maintain this stuff? So you really are just paying them for electricity and internet?


  • Service Provider

    @johnhooks said:

    Ok so this is a serious question. I've never dealt with dedicated hosting somewhere else. Do they not maintain this stuff? So you really are just paying them for electricity and internet?

    That's what a hosting facility does. They provide the electric, internet and HVAC. You are in charge of everything else.



  • @scottalanmiller said:

    @johnhooks said:

    Ok so this is a serious question. I've never dealt with dedicated hosting somewhere else. Do they not maintain this stuff? So you really are just paying them for electricity and internet?

    That's what a hosting facility does. They provide the electric, internet and HVAC. You are in charge of everything else.

    I thought you could have a colo do that type of stuff for you? If not, what's the advantage to paying for dedicated servers that you can't access over a colo?



  • @johnhooks said:

    Whoa

    0_1457543657784_loadavg.png

    That's, um, high.



  • @johnhooks said:

    @scottalanmiller said:

    @johnhooks said:

    Ok so this is a serious question. I've never dealt with dedicated hosting somewhere else. Do they not maintain this stuff? So you really are just paying them for electricity and internet?

    That's what a hosting facility does. They provide the electric, internet and HVAC. You are in charge of everything else.

    I thought you could have a colo do that type of stuff for you? If not, what's the advantage to paying for dedicated servers that you can't access over a colo?

    uh - yeah I agree with JH here - hosting to me I think of VM's on your platform... Colo means my equipment I'm responsible.



  • If you are renting a dedicated server from a facility, you should be able to call their support and tell them what is going on, so they can replace the faulty drive for you (after you have good backups, of course!)


  • Service Provider

    @dafyre said:

    If you are renting a dedicated server from a facility, you should be able to call their support and tell them what is going on, so they can replace the faulty drive for you (after you have good backups, of course!)

    They don't have RAID, though. The colo should do that... but you'd be left with a dead system. I'm guessing no IPMI system either, if they didn't even bother with RAID.



  • @scottalanmiller said:

    @dafyre said:

    If you are renting a dedicated server from a facility, you should be able to call their support and tell them what is going on, so they can replace the faulty drive for you (after you have good backups, of course!)

    They don't have RAID, though. The colo should do that... but you'd be left with a dead system. I'm guessing no IPMI system either, if they didn't even bother with RAID.

    Take for instance, the server that I have with KimSufi... I don't have raid in that box. If the HD dies, then whoops!

    They replace the hard drive, and I re-image through their web portal and restore my data from backups.



  • @johnhooks said:

    Whoa

    0_1457543657784_loadavg.png

    If bet if you check using top or glances, you'll see the IO Wait % is very high.



  • @scottalanmiller said:

    @dafyre said:

    If you are renting a dedicated server from a facility, you should be able to call their support and tell them what is going on, so they can replace the faulty drive for you (after you have good backups, of course!)

    They don't have RAID, though. The colo should do that... but you'd be left with a dead system. I'm guessing no IPMI system either, if they didn't even bother with RAID.

    His might not, but I just looked at the Fasthosts site and they advertise RAID 1 for their smallest quad core system. It's still $70 a month just for a desktop processor and 12 GB RAM.

    Which sucks. If I pay that price today I get RAID1, so why doesn't he get it? (Unless he has a grandfathered price).


  • Service Provider

    @johnhooks said:

    @scottalanmiller said:

    @dafyre said:

    If you are renting a dedicated server from a facility, you should be able to call their support and tell them what is going on, so they can replace the faulty drive for you (after you have good backups, of course!)

    They don't have RAID, though. The colo should do that... but you'd be left with a dead system. I'm guessing no IPMI system either, if they didn't even bother with RAID.

    His might not, but I just looked at the Fasthosts site and they advertise RAID 1 for their smallest quad core system. It's still $70 a month just for a desktop processor and 12 GB RAM.

    Which sucks. If I pay that price today I get RAID1, so why doesn't he get it? (Unless he has a grandfathered price).

    He said that it had no RAID at the beginning.


  • Service Provider

    @johnhooks said:

    Which sucks. If I pay that price today I get RAID1, so why doesn't he get it? (Unless he has a grandfathered price).

    Not related. Not like your server moves hardware on its own. It stays on what you started on. To migrate it would need downtime.



  • @scottalanmiller said:

    @johnhooks said:

    @scottalanmiller said:

    @dafyre said:

    If you are renting a dedicated server from a facility, you should be able to call their support and tell them what is going on, so they can replace the faulty drive for you (after you have good backups, of course!)

    They don't have RAID, though. The colo should do that... but you'd be left with a dead system. I'm guessing no IPMI system either, if they didn't even bother with RAID.

    His might not, but I just looked at the Fasthosts site and they advertise RAID 1 for their smallest quad core system. It's still $70 a month just for a desktop processor and 12 GB RAM.

    Which sucks. If I pay that price today I get RAID1, so why doesn't he get it? (Unless he has a grandfathered price).

    He said that it had no RAID at the beginning.

    I must have glossed over that.


  • Service Provider

    He said single drive. Maybe that is wrong If it is wrong, they should swap the drive ASAP.



  • @scottalanmiller said:

    @johnhooks said:

    Which sucks. If I pay that price today I get RAID1, so why doesn't he get it? (Unless he has a grandfathered price).

    Not related. Not like your server moves hardware on its own. It stays on what you started on. To migrate it would need downtime.

    It won't move hardware, but you would be able to move the data. Which I guess you could buy another and move the data, but they could give you a free window to get that done.



  • Another option would be for the provider to give you a few windows for down time. With a single server and a single drive, downtime has to be expected (well maybe not since there were no backups). Just a window long enough to bring the server down, and add a drive. I mean for what you're paying, you could have bought the whole thing outright in under a year.


  • Service Provider

    @johnhooks said:

    @scottalanmiller said:

    @johnhooks said:

    Which sucks. If I pay that price today I get RAID1, so why doesn't he get it? (Unless he has a grandfathered price).

    Not related. Not like your server moves hardware on its own. It stays on what you started on. To migrate it would need downtime.

    It won't move hardware, but you would be able to move the data. Which I guess you could buy another and move the data, but they could give you a free window to get that done.

    Yes, but that is something that you would have to do, not something that they can realistically do. I know of no provider that does anything like that.


  • Service Provider

    @johnhooks said:

    Another option would be for the provider to give you a few windows for down time. With a single server and a single drive, downtime has to be expected (well maybe not since there were no backups). Just a window long enough to bring the server down, and add a drive. I mean for what you're paying, you could have bought the whole thing outright in under a year.

    there are a lot of assumptions here. Whose hardware is it? if this was devops, this wouldn't be an issue. Things like that. So assumed downtime might not apply.



  • @scottalanmiller said:

    @johnhooks said:

    @scottalanmiller said:

    @johnhooks said:

    Which sucks. If I pay that price today I get RAID1, so why doesn't he get it? (Unless he has a grandfathered price).

    Not related. Not like your server moves hardware on its own. It stays on what you started on. To migrate it would need downtime.

    It won't move hardware, but you would be able to move the data. Which I guess you could buy another and move the data, but they could give you a free window to get that done.

    Yes, but that is something that you would have to do, not something that they can realistically do. I know of no provider that does anything like that.

    I agree. That's what I mean, they give you a window on a new server for you to move your data.


Log in to reply
 

Looks like your connection to MangoLassi was lost, please wait while we try to reconnect.