I was hoping there was some type of server migration software or enterprise deduplication software that would be able to crawl all our data, store the results in some type of database and then allow us to parse the results.
When you throw 10MM files at traditional duplicate cleaners, they tend to blow up. Then, after you clean some parts up, guess what... you have to rescan and wait.
There has to be a better way. Block-level deduplication solves part of the storage size equation, but doesn't address the root cause of the problem in the first place which is poor data governance. The challenge is going from messy > organized in an efficient manner.
Has anybody used this, or know of something similar?
http://www.valiancepartners.com/data-migration-tools/trucompare-data-migration-testing/