@DustinB3403 said in Tar gzip file compression calculation without decompressing the file:
@Pete-S So the simplest way I can think to explain this would be like this.
You have a network share which is relatively organized
You create a compressed tarball of any folder on that share and then move that tarball to offsite storage.
How would I realistically get a hash of that folder pre and post tar and compression and have it make sense? They aren't the same thing, even if they contain the same things.
@Pete-S said in Tar gzip file compression calculation without decompressing the file:
Is it safe to assume that the gzip file is correct when it is created?
This is what I'm looking to verify 🙂
I'm assuming that files are static during backup.
If you first of all run md5deep on all files in the folder, you'll create a textfile that contains md5 (or sha256 or what you want) signatures on every file in the folder. Place it into the folder so it ends up inside the backup and you'll always have the ability to verify any uncompressed individual file.
If you really want to verify your tar.gz file after it's created I think you have to decompress the files to a temporary folder, run md5deep on the files to compare them with the original file. What you really are testing is that the backup-compress-decompress-restore operation is lossless on every file. It should be by design, but if there is an unlikely bug somewhere it's technically possible that it might not be.
If you use the gzip compression with tar, gzip has a CRC-32 checksum inside that can be used to verify the integrity of the gzip file.
Or to be even more certain you can create an md5 signature of the entire gzip archive with md5sum or md5deep. Then you can always verify that the archive has not been corrupted.
If you ever need to restore the files you can verify the integrity of the restored files with the md5 you created on the original files, before you did the backup.