ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Unsolved Tar gzip file compression calculation without decompressing the file

    IT Discussion
    tar gzip unix apple osx
    6
    20
    2.3k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DustinB3403D
      DustinB3403
      last edited by

      Now I know I can also browse the file with tar -tf name.tar.gz and see what is actually in the tarball, but it's a bit of a pain in the rear to potentially read through possibly millions of entries.

      1 Reply Last reply Reply Quote 0
      • 1
        1337 @DustinB3403
        last edited by 1337

        @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

        So I have a tar.gz file that I've built, it's about 95GB compressed from what the system shows. I would like to determine the uncompressed size of the tar file without actually decompressing the file.

        Using gzip -l file.tar.gz should work, but reports an incorrect record (total size in bytes is just over 1GB).

        How else should I do this?

        I'm not sure exactly what you're after. I think the gzip file format header doesn't include the original file size. So the only way to calculate how big the uncompressed file becomes, is to look through the entire compressed file.

        The problem with decompression is that you have to make a big file and it takes time. If you however decompress on the fly and calculate the number of bytes you will avoid that problem.

        gzip -c -d file.gz | wc -c
        
        DustinB3403D 1 Reply Last reply Reply Quote 0
        • DustinB3403D
          DustinB3403 @1337
          last edited by

          @Pete-S said in Tar gzip file compression calculation without decompressing the file:

          I'm not sure exactly what you're after.

          I'm attempting to create an archive, and confirm that 100% of what I archived is in said archive by byte count.

          Rather than me having to review the compressed tarball and looking through it for specific files or folders.

          Essentially, I want to verify my archives before I offload them to cloud storage and find out (who knows how far down the line) that something was missed. (for whatever reason)

          1 Reply Last reply Reply Quote 0
          • DustinB3403D
            DustinB3403
            last edited by

            I guess a more relevant way to have expressed my question would have been to ask:

            How do you verify what is in your tarball before you offload it in a quick and efficient manner?

            I want to trust, but verify (as this is a backup).

            1 1 Reply Last reply Reply Quote 0
            • 1
              1337 @DustinB3403
              last edited by 1337

              @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

              I guess a more relevant way to have expressed my question would have been to ask:

              How do you verify what is in your tarball before you offload it in a quick and efficient manner?

              I want to trust, but verify (as this is a backup).

              Generally done with a hash of the files - sha-256, md5 or similar.

              DustinB3403D 1 Reply Last reply Reply Quote 0
              • DustinB3403D
                DustinB3403 @1337
                last edited by

                @Pete-S said in Tar gzip file compression calculation without decompressing the file:

                @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

                I guess a more relevant way to have expressed my question would have been to ask:

                How do you verify what is in your tarball before you offload it in a quick and efficient manner?

                I want to trust, but verify (as this is a backup).

                Generally done with a hash of the files - sha-256, md5 or similar.

                And how would that work from my source of unpacked files?

                1 1 Reply Last reply Reply Quote 1
                • 1
                  1337 @DustinB3403
                  last edited by 1337

                  @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

                  @Pete-S said in Tar gzip file compression calculation without decompressing the file:

                  @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

                  I guess a more relevant way to have expressed my question would have been to ask:

                  How do you verify what is in your tarball before you offload it in a quick and efficient manner?

                  I want to trust, but verify (as this is a backup).

                  Generally done with a hash of the files - sha-256, md5 or similar.

                  And how would that work from my source of unpacked files?

                  What point in the chain from original file -> backup -> tar ball -> gzip -> offload to archive do you want to verify?

                  Is it safe to assume that the gzip file is correct when it is created?

                  DustinB3403D 1 Reply Last reply Reply Quote 0
                  • DustinB3403D
                    DustinB3403 @1337
                    last edited by

                    @Pete-S So the simplest way I can think to explain this would be like this.

                    You have a network share which is relatively organized

                    You create a compressed tarball of any folder on that share and then move that tarball to offsite storage.

                    How would I realistically get a hash of that folder pre and post tar and compression and have it make sense? They aren't the same thing, even if they contain the same things.

                    @Pete-S said in Tar gzip file compression calculation without decompressing the file:

                    Is it safe to assume that the gzip file is correct when it is created?

                    This is what I'm looking to verify 🙂

                    IRJI 1 2 Replies Last reply Reply Quote 0
                    • IRJI
                      IRJ @DustinB3403
                      last edited by

                      @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

                      @Pete-S So the simplest way I can think to explain this would be like this.

                      You have a network share which is relatively organized

                      You create a compressed tarball of any folder on that share and then move that tarball to offsite storage.

                      How would I realistically get a hash of that folder pre and post tar and compression and have it make sense? They aren't the same thing, even if they contain the same things.

                      @Pete-S said in Tar gzip file compression calculation without decompressing the file:

                      Is it safe to assume that the gzip file is correct when it is created?

                      This is what I'm looking to verify 🙂

                      Use a FIM like wazuh

                      1 Reply Last reply Reply Quote 0
                      • 1
                        1337 @DustinB3403
                        last edited by 1337

                        @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

                        @Pete-S So the simplest way I can think to explain this would be like this.

                        You have a network share which is relatively organized

                        You create a compressed tarball of any folder on that share and then move that tarball to offsite storage.

                        How would I realistically get a hash of that folder pre and post tar and compression and have it make sense? They aren't the same thing, even if they contain the same things.

                        @Pete-S said in Tar gzip file compression calculation without decompressing the file:

                        Is it safe to assume that the gzip file is correct when it is created?

                        This is what I'm looking to verify 🙂

                        I'm assuming that files are static during backup.

                        If you first of all run md5deep on all files in the folder, you'll create a textfile that contains md5 (or sha256 or what you want) signatures on every file in the folder. Place it into the folder so it ends up inside the backup and you'll always have the ability to verify any uncompressed individual file.

                        If you really want to verify your tar.gz file after it's created I think you have to decompress the files to a temporary folder, run md5deep on the files to compare them with the original file. What you really are testing is that the backup-compress-decompress-restore operation is lossless on every file. It should be by design, but if there is an unlikely bug somewhere it's technically possible that it might not be.

                        If you use the gzip compression with tar, gzip has a CRC-32 checksum inside that can be used to verify the integrity of the gzip file.

                        Or to be even more certain you can create an md5 signature of the entire gzip archive with md5sum or md5deep. Then you can always verify that the archive has not been corrupted.

                        If you ever need to restore the files you can verify the integrity of the restored files with the md5 you created on the original files, before you did the backup.

                        1 Reply Last reply Reply Quote 1
                        • 1 / 1
                        • First post
                          Last post