SCDPM 2016 using MBS
Edit: Some background on what I'm doing, for the sake of context:
I've got to say, so far my semi-production testing of System Center Data Protection Manager 2016 using Modern Backup Storage method is going well.
This post may be a little early, as I still have a ton of testing scenarios to perform... but I wanted to share my initial results, with maybe some hopes of others who are/have done the same thing could share theirs as well.
I started the test backing up two sources:
- A Hyper-V Server 2016 host used to build and hold "gold master" images for image deployment (287 GB)
- A real production "main" fileserver(WS2016) (located on a different 2012 R2 Datacenter host) that has a little over 3 TB of data (which already includes one or two already deduped volumes) (3.1 TB)
I have the "gold master" hyper-v host being backed up to 16x 250 GB .vhdx disks in a simple volume (raid0) (F:). No need to raid that as they are located on a physical RAID10 which is formatted as NTFS.
The "main" fileserver is being backed up on to 24x 500 GB .vhdx disks in a simple volume (raid0), also located on a physical RAID10 formatted as NTFS (E: on the backup server host)
So together, they are about 3.4 TB of data. Here's the dedup results on the backup host:
So right off the bat, it looks like all the gold masters are deduping pretty well (73% space savings). I'm guessing because there's a lot of like data, but what I want to test is to see if the smaller .vhdx disks on the back-end make a difference.
The 3.1 TB fileserver backup deduped at 45% space savings. I was hoping for 50% but close enough.
What's it look like on the back end?
I added 3 additional SCSI controllers on the VM, and added the 40x .vhdx disks split among those.
Within the backup VM (serv-DPM), I used Storage Spaces to create two volumes to present to SCDPM, which formats that as ReFS.
I don't have DPM backing up the Fileserver and other gold master host over the weekend because that's when our real backups are done, so I don't yet have any results regarding subsequent DPM backups. I have them set for 7 days retention, and to back up mon,tues,wed,thurs.
I'll post more results later this week. And even more further test results in the coming weeks.
I'll be doing a big "thing" on this in the near future. Probably on my blog in the end, but I'll post here the statuses and updates here in the meantime.
This is the backup server HOST:
This is the DPM vm. Note that I over-provisioned storage on purpose for dedup on the back-end:
Side note: The smaller physical volume (1.9 TB (RAID10)) is the internal storage on the R420 (serv-backup). The larger physical volume (5.91 TB (RAID10)) is a bunch of random spinning rust in an MD1000 attached to the R420.
The initial backup (data sync) of the 3 TB and 300 GB was very fast. Much better than the current production backup method. That alone would be a nice improvement, allowing more time for maintenance after backups are done.
A couple days ago, I added a third production server to the backup test that is about 1.6 TB.
All protection groups are scheduled to back up daily (excluding weekends). They all complete pretty fast, even without taking advantage of the new Resilient Change Tracking (RCT) technology used when VMs are running at a configuration version of 8.0 (Hyper-V 2016).
I now have three test protection groups (split weirdly for testing and tracking):
- VM1 - fileserver (3.2 TB of data) (Test group 1)
- Hyper-V Host running 6 VMs (255 GB of data) (Test group 2)
- VM2 - application server (1.61 TB of data) (Test group 3)
Total data to back up: 5065 GB (5.1 TB)
I have between 3 and 6 recovery points for each VM or server, depending on which one it is.
DPM Admin Console shows the following amounts of backup storage capacity being used for each protection group:
Test Group 1: 3250 GB
Test Group 2: 258 GB
Test Group 3: 1658 GB
Total backup storage capacity being used: 5166 GB
This number contains 3 to 6 recovery points.
Now on the DPM Host:
You can see I'm averaging over 50% space savings.
Backup space savings of over 50%, upwards of 75% depending on the group, plus the fact it does it quickly (with further optimization available), shows this is going rather nicely, and sure beats the current process.
Next, (after some more data testing), I want to test backup replication, tape, and cloud using different retention ranges, backup frequency, recovery points, backup modes, etc...
I will also be testing virtual tape (via iSCSI) using Starwind. I plan on using that to replicate the existing backups to another location.
scottalanmiller last edited by
Well, I finally finally finally found HP drivers for my LTO2 tape drive test on Windows Server 2016.
Some HPE driver repository page...
Hopefully I can save someone else the trouble, here's the link: https://downloads.linux.hpe.com/SDR/repo/spp/2016.04.0/hp/swpackages/
cp023805.exe worked for me.
Now I just need to find an LTO2 tape. In the meantime, I'm working on getting an MD3000 running at another site for vTape testing with DPM 2016... using StarWind virtual tape redirector.
To make things easier:
- Backup host = HOST1 (MD1000 is plugged in to HOST1 (serv-backup))
- DPM server = VM1 (running on HOST1 (serv-DPM))
This whole thing is pretty resilient it seems. Due to a clerical error, the MD1000 was accidentally switched off and on, instead of a different system. Somehow, the physical host (HOST1) was stuck in a state where dedup optimization was in progress, plus the DPM vHost (VM1) running on it was also syncing backups... normally that shouldn't matter, it should just push through it slower than normal.
But it was stuck for over a week. I didn't notice for that long because it's just a test system and my attention was elsewhere, no alerting set up.
RAM was at 100% use too.
I "turned off" not shut down, VM1. I rebooted HOST1. I did Windows updates/rebooted. I manually performed a dedup optimization on both storage volumes (internal storage and MD1000), also garbage collection and scrubbing jobs on both volumes.
Now HOST1 was in good shape. So I booted up VM1, did the update/reboot dance.
The existing backup jobs synchronized (backed up) automatically on their own, the production test jobs. Mainly, a 4tb file server VM and 1.5 tb application server VM. Only took half an hour. In fact I didn't even know it done it until hours later.
No backup data loss.
I'm now in the process of adding more config version 8.0 VMs to a new protection job to back up using DPM 2016. It's pretty cool to see it working, how it is randomly distributing the data to the virtual disks that get deduped:
I also made some adjustments to RAM usage.
HOST1 only has 24 GB of RAM.
VM1 uses 12 GB, non-dynamic, as it houses the MS SQL database and also runs SCDPM. I'm sure I can cut this in half to 6 GB as it's barely being used, but I'm not going to at this time. I'm also trying to push limits and have things break in this test environment before moving to production. So I'm also trying to cut things close.
Anyways, Dedup daily optimization is set to use 50% RAM... because VM1 is using the other half, when dedup optimization occurs, the server gets really slow. Nothing else is happening on the server during this time, so it shouldn't matter. But I am doing stuff on it for testing purposes and I just can't do anything during the hour or two it takes to run Dedup.
So I edited the amount of RAM dedup uses just for the daily optimization schedule:
Set-DedupSchedule -Name DailyOptimization -Memory 35
For my test environment, I found 35% to be good. Now I have some memory to work with during the time it's running dedup optimization (6am)... which is also the best time for me to work on things.