HelloWill

HelloWill

We are in the process of consolidating all our data from multiple servers down to one. We have about 250 TB of data, and we’ve moved about 200 TB of stuff already, and I’m seeing we have tons of duplicate files and folders.

Our data is a mixture of RAW video (.R3D), pictures and general business data. Some of our data has alternate data streams. I need to make sure all our data transferred correctly and all the metadata and ADS is intact.

What’s the best way to easily verify all the files are the same and then deduplicate? I’m looking for those who’ve been there, done that for some advice on tools and methods to make sure everything is good.

HelloWill

We have a large FreeNAS server that is loaded with files. I am looking for advice on the best way to get things cleaned up, and I know there's tons of duplicates.

File Types:

Images
Text
Videos

File Counts:

10,000,000+ Files
200+ TB

I've tried running many other duplicate scanners, but they haven't been easy because the scanners crash when they get logs too big, it's hard to get context, and it takes days to scan without checksums (Takes a really long time to checksum (MD5) files). And to top it off, they only run on one PC so I can't even enlist the rest of the team to help clean up.

I need a way to make it so that we can easily scan files, identify duplicates, and be able to ideally save scan results and checksums such that we don't need to keep re-scanning the same files again and again. I like beyond compare, but it helps after the duplicates have been identified.

What do you guys do to scan this much data and make sense of it / organize it?

HelloWill

I was hoping there was some type of server migration software or enterprise deduplication software that would be able to crawl all our data, store the results in some type of database and then allow us to parse the results.

When you throw 10MM files at traditional duplicate cleaners, they tend to blow up. Then, after you clean some parts up, guess what... you have to rescan and wait.

There has to be a better way. Block-level deduplication solves part of the storage size equation, but doesn't address the root cause of the problem in the first place which is poor data governance. The challenge is going from messy > organized in an efficient manner.

Has anybody used this, or know of something similar?
http://www.valiancepartners.com/data-migration-tools/trucompare-data-migration-testing/

HelloWill

I can run any software from either a workstation or the server, however running things directly on FreeNAS makes me nervous because i'm not sure how it will react.

The files are shared as a NAS, although we could connect via iSCSI or similar

HelloWill

We have a large FreeNAS server that is loaded with files. I am looking for advice on the best way to get things cleaned up, and I know there's tons of duplicates.

File Types:

Images
Text
Videos

File Counts:

10,000,000+ Files
200+ TB

I've tried running many other duplicate scanners, but they haven't been easy because the scanners crash when they get logs too big, it's hard to get context, and it takes days to scan without checksums (Takes a really long time to checksum (MD5) files). And to top it off, they only run on one PC so I can't even enlist the rest of the team to help clean up.

I need a way to make it so that we can easily scan files, identify duplicates, and be able to ideally save scan results and checksums such that we don't need to keep re-scanning the same files again and again. I like beyond compare, but it helps after the duplicates have been identified.

What do you guys do to scan this much data and make sense of it / organize it?

HelloWill

Goal
Find the the best UTM Firewall for our HQ and remote offices.

Background

Less than 100 Users
Been using PFSense (not very user friendly or easy to make sense of)

What we are looking for:

Firewall
VPN
Intrusion Prevention / Intrusion Detection
Virus Protection

Here are the companies / Products that look interesting:

Sophos Cyberoam UTM
FireEye
Fortinet FortiGate
Sonicwall
Watchguard
WildFire
Untangle
Juniper
Cylance
Palo Alto Networks

Our Decision Criteria:

Simplicity / Ease of Maintance
Safe / Secure / Reliable
Fast / Won't slow us down noticeably
Can pay for Support
Scalable
Easy Reports / Dashboards
It simply just works

Would love some feedback and help narrowing down my list from anyone with real world experience with any of these...

Cheers!

HelloWill

FreeNAS

HelloWill

We are in the process of consolidating all our data from multiple servers down to one. We have about 250 TB of data, and we’ve moved about 200 TB of stuff already, and I’m seeing we have tons of duplicate files and folders.

Our data is a mixture of RAW video (.R3D), pictures and general business data. Some of our data has alternate data streams. I need to make sure all our data transferred correctly and all the metadata and ADS is intact.

What’s the best way to easily verify all the files are the same and then deduplicate? I’m looking for those who’ve been there, done that for some advice on tools and methods to make sure everything is good.

HelloWill

@HelloWill

Best posts made by HelloWill

Latest posts made by HelloWill