How do you find duplicates from Windows SMB shares using Linux
-
I'm just looking for a way to tally the amount of duplicate files there are on any given share, doesn't need to be anything fancy. I would ideally like it to check the hashes of the files and then post a summary to a log file.
I'm looking at fdupes (
dnf install fdupes) as this might do what I want, but I'm open to suggestions. -
@DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:
I'm looking at fdupes ( dnf install fdupes ) as this might do what I want, but I'm open to suggestions.
I looked it up and that's what I found as likely the best option, too.
-
@DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:
I'm just looking for a way to tally the amount of duplicate files there are on any given share, doesn't need to be anything fancy. I would ideally like it to check the hashes of the files and then post a summary to a log file.
I'm looking at fdupes (
dnf install fdupes) as this might do what I want, but I'm open to suggestions.I would assume you can just write command output to a file and that should accomplish what you want with most simplicity.
-
@IRJ Yeah the output part is really simple, fdupes seems really simple too.
fdupes -rmsHA --sameline /target > output.logis running.I just wasn't sure if there was any better options out there.
-
@DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:
@IRJ Yeah the output part is really simple, fdupes seems really simple too.
fdupes -rmsHA --sameline /target > output.logis running.I just wasn't sure if there was any better options out there.
I just realized that the
--samelineoption can be replaced with-1as in number one. The manual isn't clear about that and reading the option itself is difficult to delineate the difference. -
@DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:
@IRJ Yeah the output part is really simple, fdupes seems really simple too.
fdupes -rmsHA --sameline /target > output.logis running.I just wasn't sure if there was any better options out there.
you also may want to grep for certain data if the entire output is too noisy
-
@IRJ said in How do you find duplicates from Windows SMB shares using Linux:
@DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:
@IRJ Yeah the output part is really simple, fdupes seems really simple too.
fdupes -rmsHA --sameline /target > output.logis running.I just wasn't sure if there was any better options out there.
you also may want to grep for certain data if the entire output is too noisy
Normally I would filter down, but since I'm just trying to get a grasp on the amount of potential duplication that there is, filtering at this point would only skew that number.
-
@DustinB3403 some folks claim jdupes is faster, I have used both, and did not much of a difference.
Both work well. -
@pattonb to get an idea of how many dupes use the following
fdupes -r -m /directory(share to scan)
-
I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.
-
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.
I gathered that the SMB shares are hosted on Linux, but I could be wrong.
If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.
-
@IRJ said in How do you find duplicates from Windows SMB shares using Linux:
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.
I gathered that the SMB shares are hosted on Linux, but I could be wrong.
If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.
The title says - Windows SMB Shares.
My guess is that Dustin is a lone wolf running a 'nix OS as his machine - and the rest of the company is using Windows. Nothing wrong with that, just my guess.
-
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
@IRJ said in How do you find duplicates from Windows SMB shares using Linux:
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.
I gathered that the SMB shares are hosted on Linux, but I could be wrong.
If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.
The title says - Windows SMB Shares.
My guess is that Dustin is a lone wolf running a 'nix OS as his machine - and the rest of the company is using Windows. Nothing wrong with that, just my guess.
His company is significantly Mac.
-
@JaredBusch said in How do you find duplicates from Windows SMB shares using Linux:
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
@IRJ said in How do you find duplicates from Windows SMB shares using Linux:
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.
I gathered that the SMB shares are hosted on Linux, but I could be wrong.
If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.
The title says - Windows SMB Shares.
My guess is that Dustin is a lone wolf running a 'nix OS as his machine - and the rest of the company is using Windows. Nothing wrong with that, just my guess.
His company is significantly Mac.
aww, that's right - he has been asking a lot of MAC questions lately.
-
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
@JaredBusch said in How do you find duplicates from Windows SMB shares using Linux:
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
@IRJ said in How do you find duplicates from Windows SMB shares using Linux:
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.
I gathered that the SMB shares are hosted on Linux, but I could be wrong.
If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.
The title says - Windows SMB Shares.
My guess is that Dustin is a lone wolf running a 'nix OS as his machine - and the rest of the company is using Windows. Nothing wrong with that, just my guess.
His company is significantly Mac.
aww, that's right - he has been asking a lot of MAC questions lately.
Unix questions to be more precise, but yeah we are a heavy Mac shop.