• 0 Votes
    20 Posts
    3k Views
    1

    @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

    @Pete-S So the simplest way I can think to explain this would be like this.

    You have a network share which is relatively organized

    You create a compressed tarball of any folder on that share and then move that tarball to offsite storage.

    How would I realistically get a hash of that folder pre and post tar and compression and have it make sense? They aren't the same thing, even if they contain the same things.

    @Pete-S said in Tar gzip file compression calculation without decompressing the file:

    Is it safe to assume that the gzip file is correct when it is created?

    This is what I'm looking to verify 🙂

    I'm assuming that files are static during backup.

    If you first of all run md5deep on all files in the folder, you'll create a textfile that contains md5 (or sha256 or what you want) signatures on every file in the folder. Place it into the folder so it ends up inside the backup and you'll always have the ability to verify any uncompressed individual file.

    If you really want to verify your tar.gz file after it's created I think you have to decompress the files to a temporary folder, run md5deep on the files to compare them with the original file. What you really are testing is that the backup-compress-decompress-restore operation is lossless on every file. It should be by design, but if there is an unlikely bug somewhere it's technically possible that it might not be.

    If you use the gzip compression with tar, gzip has a CRC-32 checksum inside that can be used to verify the integrity of the gzip file.

    Or to be even more certain you can create an md5 signature of the entire gzip archive with md5sum or md5deep. Then you can always verify that the archive has not been corrupted.

    If you ever need to restore the files you can verify the integrity of the restored files with the md5 you created on the original files, before you did the backup.

  • 1 Votes
    3 Posts
    996 Views
  • 2 Votes
    5 Posts
    3k Views
    scottalanmillerS

    @JJoyner1985 said in UNIX: What Is a Tarball:

    So, do you think the reason I am seeing a lot more gzip in use with tarballs is due to the familiarity of gzip and the negligible difference in the compression between it and bzip2? Basically, bzip2 doesn't make enough of an improvement with sufficient regularity to entice people to move away from gzip, or is there some other benefit to gzip that my training material hasn't covered?

    That's correct. The difference between the two is generally small enough that people are not concerned. And lots of systems still don't have bzip2 installed by default so if you want scripts or whatever to work universally you often use gzip because you know that it is always there and predictable.

  • 2 Votes
    1 Posts
    1k Views
    No one has replied