XenServer Export Performance Seems Poor

Dashrender

This server has a 60 GB SQL db, 500+ GB of TIF (scanned in paper documents) and another 100+ of application and other files associated with the old EHR.

At this point in time, the only thing changing on this system should be the access logs - who's logging in, who they are searching for, etc. The data in the DB and the TIF files, etc should all remain static.

The system (other than the log growth) should not be growing. It has around 50 GB of free space currently. This should be a lifetime of space since the main data isn't growing anymore.

DustinB3403

So @Dashrender do you need the static data on the VM including everything that makes up the 700GB to function?

Or can all of the extra stuff get pushed off to something else?

If the goal is to ensure the VM boots, and the database is accessible, then you should reduce the size of the VM as much as possible.

Anything that is static and that can get moved out of it, I would imagine should be, so you could recover from a faulty OS update that much more quickly.

Dashrender

@scottalanmiller said:

@Dashrender said:

@scottalanmiller said:

Where are the logs going now?

Into the SQL DB on the server.
same place where the EHR data lives.

A developer could very quickly make a little component that takes those logs and outputs to a text file. I mean, realistically, you could do this with a one line script - just one SQL query going out to file. ELK will grab the file and boom, all done.

I'm guessing that you're assuming that all of the logs are in a single table - and assuming that's true, then I agree with you.

Dashrender

@DustinB3403 said:

So @Dashrender do you need the static data on the VM including everything that makes up the 700GB to function?

yes - if anything on there is removed (or not mapped into it) the whole thing doesn't function as it should.

Dashrender

I should also add - 30 hours of downtime on this system would not be a huge deal.

Dashrender

@Dashrender said:

I should also add - 30 hours of downtime on this system would not be a huge deal.

If we have to go to a paper chart (yes we still have 10's of thousands of them in storage) it would take at least 24 hours to get it.. this "old" system is now in that ball park.

DustinB3403

@Dashrender said:

I should also add - 30 hours of downtime on this system would not be a huge deal.

But again, that is assuming the import and your backup is in good working condition. If it fails it could be down for multiple days.

Dashrender

@DustinB3403 said:

@Dashrender said:

I should also add - 30 hours of downtime on this system would not be a huge deal.

But again, that is assuming the import and your backup is in good working condition. If it fails it could be down for multiple days.

and it would be down for multiple days if the data VM dies and doesn't restore correctly either.

DustinB3403

But with the data you could have multiple known good copies, with the VM you have your individual backups.

Which all need to be tested on a regular basis to confirm they function. Which would at best take ~30 hours to test the import of.

Dashrender

@DustinB3403 said:

But with the data you could have multiple known good copies, with the VM you have your individual backups.

Which all need to be tested on a regular basis to confirm they function. Which would at best take ~30 hours to test the import of.

Multiple known good copies? huh? Why would I have multiple copies of that non changing data?

DustinB3403

The very same reason you keep multiple copies of anything critical..... so you have another to recover from.

Even if all 700GB are in this VM, you don't keep just 1 backup of it.

Dashrender

@DustinB3403 said:

The very same reason you keep multiple copies of anything critical..... so you have another to recover from.

Even if all 700GB are in this VM, you don't keep just 1 backup of it.

You have a point here.

Dashrender

Dustin, you still haven't told me what makes my application VM more vulnerable than a Data SAMBA share or a NAS though to warrant splitting it.

DustinB3403

So my point with reducing the size of your VM is multiple pointed.

It'll reduce backup time (unless you're doing delta's in which case only the roll-over) will take a while
It'll speed up import time (less to transfer into XS)
It'll be less to have to keep stored as a backup.

If you put the data into a separate medium (and chime in folks if you think I'm wrong here) you'd simply update the pathing in the database to access the primary remote store.

This remote store would get backed up to (lets just say) a 4 bay Synology, which then gets pushed off to (again lets just say) BackBlaze B2.

You'd have multiple copies of the data which is needed for the VM, off host, which can then be restored from separate mediums should something go belly up.

JaredBusch

@DustinB3403
Would you please STFU and take a moment to try and understand the scenario here.

FFS man, this is a really simple thing here. @Dashrender has an old EHR system that he needs to be powered on for historical lookup purposes. It is virtualized and thus hardware agnostic.

It is a legacy system. There are no developers available for it as the original developers sold it and the new owners EoL'd it 3 years ago.

Given all of that, how would you go in and break out all the static data (the TIF scans) without breaking the entire f[moderated]ing system?

Let me clue you in.

This is a SQL Server & IIS based application as noted previously.
This means it will be very safe to assume that when things were scanned in and saved, the application wrote the file path to the document into the database records for each file.
So you will then need to go spend time writing custom SQL to pull all of these references out of the database and then verify the structure so you can then update everything outside of the application itself.
Once you know how it is all mapped, you could easily move the TIF files over and map a share or use UNC or even symlinked folder (depending on the limits of the application).

Then you have to mass update the records to match the new path.

All of this on a legacy and unsupported system.

So prior to [moderated]ing an entire system up, what would you need to do? Oh yeah, make a F**** [moderated] backup. Which is exactly what @Dashrender is doing right now.

DustinB3403

@JaredBusch And what should he restore from if the backup that he takes is corrupted and he has nothing else to restore from?

Say goodbye to this system which is critical? @JaredBusch did you even see how @Dashrender is making this backup? He's pulling it from XO's webconsole and saving it to a USB 3.0 disk.

This alone would take a long time. Even if the system is used only for historical purposes only, that's a huge amount of time to try and recover from the backup he's created today.

Assuming that there is nothing corrupted in the backup he's made. Or that nothing gets corrupted while he tries to restore it.

Dashrender

The backup I'm making today is a point in time (if I don't have logging because of a fire, it's not the end of the world, as long as i have the data).

This drive contains PHI from another project we recently shutdown. So since there was plenty of space on it, I put a copy of this VM on there so the whole thing can go to the safety deposit box and be an offsite copy for now.

This isn't my main backup.

Also, It appears that my slow backup has more to do with GZip than USB 3.0.

I'll add another 1 TB drive to my machine and do a non compressed, non USB backup and we'll see what i get..

then I should do a non compressed backup to USB 3.0 and see what I get to see if the USB has any real baring here.. I doubt it does.

JaredBusch

@DustinB3403 said:

@JaredBusch And what should he restore from if the backup that he takes is corrupted and he has nothing else to restore from?

Say goodbye to this system which is critical? @JaredBusch did you even see how @Dashrender is making this backup? He's pulling it from XO's webconsole and saving it to a USB 3.0 disk.

This alone would take a long time. Even if the system is used only for historical purposes only, that's a huge amount of time to try and recover from the backup he's created today.

Assuming that there is nothing corrupted in the backup he's made. Or that nothing gets corrupted while he tries to restore it.

That it is writing to USB is not relevant here. That is not the bottleneck. A 700GB copy to a USB 3.0 is not that big of a deal.

The source of the problem has already been pointed out to be XS.

If @Dashrender was willing to kill the current job, he would get mush better speeds on his subsequent backup.

He is backing up using the designed tools in the designed method. Those tools are what is broke.

His backup media of choice is irrelevant unless it is actually the bottleneck (it isn't).

What if I backup to a NAS and the NAS shits during a restore? It is the same scenario.

Once he resolves the gzip issue, the limitation here will be wirespeed.

JaredBusch

@DustinB3403 said:

Assuming that there is nothing corrupted in the backup he's made. Or that nothing gets corrupted while he tries to restore it.

I wanted to highlight this point separately. You are using the wrong Hypervisor if you even have to think about this. Data does not just randomly corrupt when being accessed. If you regularly have this kind of experience, then you have been doing all kinds of shit wrong. Likely using cheap consumer hardware.

DustinB3403

@JaredBusch You've never had a large download become corrupted?