How to validate data migration and dedupe files and video?
-
We are in the process of consolidating all our data from multiple servers down to one. We have about 250 TB of data, and we’ve moved about 200 TB of stuff already, and I’m seeing we have tons of duplicate files and folders.
Our data is a mixture of RAW video (.R3D), pictures and general business data. Some of our data has alternate data streams. I need to make sure all our data transferred correctly and all the metadata and ADS is intact.
What’s the best way to easily verify all the files are the same and then deduplicate? I’m looking for those who’ve been there, done that for some advice on tools and methods to make sure everything is good.
-
You did not mention your platform that you are using. That will have a big effect on what tools will be available for this task. Is this all on Windows? Linux? A NAS?
-
I've used hash file scanners and listed all of the duplicates. But this was solely against a single network share. Not against two shares.
And it took forever, and chews up a lot of system processing power.
@scottalanmiller asked the question, what platfrom is this on?
-
FreeNAS
-
@HelloWill said in How to validate data migration and dedupe files and video?:
FreeNAS
LMAO, I just realized who this is
-
I'm a bit slow today, apparently.
-
Deduplication by script can work pretty well. What is difficult is figuring out if things point to the files in different locations, so that they don't lose track of where the remaining actual file is. Symlinks can fix this, it's just annoying.