ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Tar gzip file compression calculation without decompressing the file

    Scheduled Pinned Locked Moved Unsolved IT Discussion
    targzipunixappleosx
    20 Posts 6 Posters 3.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DustinB3403D
      DustinB3403 @Dashrender
      last edited by

      @Dashrender said in Tar gzip file compression calculation without decompressing the file:

      @black3dynamite said in Tar gzip file compression calculation without decompressing the file:

      I bet because your file is still compressed.

      That command should work fine if was,gzip -l file.gz instead file.tar.gz.

      Sidetrack - wouldn't dropping the tar actually break all the files apart? I thought tar is what put all the files into a single container, and gzip is what compressed it? yep, Linux noob here. or tar.gz noob, or both.

      Correct, tar puts everything into a container and gzip supplies the compression.

      1 Reply Last reply Reply Quote 0
      • Emad RE
        Emad R @DustinB3403
        last edited by Emad R

        @DustinB3403

        7zip provides that, so i assume
        7za t archive will work or
        7z t archive

        DustinB3403D 1 Reply Last reply Reply Quote 1
        • DustinB3403D
          DustinB3403 @Emad R
          last edited by

          @Emad-R said in Tar gzip file compression calculation without decompressing the file:

          @DustinB3403

          7zip provides that, so i assume
          7za t archive will work or
          7z t archive

          I'll give this a try!

          1 Reply Last reply Reply Quote 0
          • DustinB3403D
            DustinB3403
            last edited by DustinB3403

            That actually doesn't show the uncompressed size, it just pulls the compressed amount. (at least with my test).

            I may be taking that back, as I now realize it's performing a calculation on the file.

            1 Reply Last reply Reply Quote 0
            • DustinB3403D
              DustinB3403
              last edited by DustinB3403

              So I was able to use 7zip to run a test against the file in question, what I'm not understanding is the output. . .

              Testing archive: path/name.tar.gz
              Path = path/name.tar.gz
              Type = gzip
              Headers Size = 10
              
              Everything is Ok
              
              Size:         1608683520
              Compressed:  95962485348
              

              How does that make any sense? The compressed amount matches what I can get through finder, but the size makes literally zero sense (unless it is using a different measurement, like bytes instead of bits).

              1 Reply Last reply Reply Quote 0
              • DustinB3403D
                DustinB3403
                last edited by

                Now I know I can also browse the file with tar -tf name.tar.gz and see what is actually in the tarball, but it's a bit of a pain in the rear to potentially read through possibly millions of entries.

                1 Reply Last reply Reply Quote 0
                • 1
                  1337 @DustinB3403
                  last edited by 1337

                  @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

                  So I have a tar.gz file that I've built, it's about 95GB compressed from what the system shows. I would like to determine the uncompressed size of the tar file without actually decompressing the file.

                  Using gzip -l file.tar.gz should work, but reports an incorrect record (total size in bytes is just over 1GB).

                  How else should I do this?

                  I'm not sure exactly what you're after. I think the gzip file format header doesn't include the original file size. So the only way to calculate how big the uncompressed file becomes, is to look through the entire compressed file.

                  The problem with decompression is that you have to make a big file and it takes time. If you however decompress on the fly and calculate the number of bytes you will avoid that problem.

                  gzip -c -d file.gz | wc -c
                  
                  DustinB3403D 1 Reply Last reply Reply Quote 0
                  • DustinB3403D
                    DustinB3403 @1337
                    last edited by

                    @Pete-S said in Tar gzip file compression calculation without decompressing the file:

                    I'm not sure exactly what you're after.

                    I'm attempting to create an archive, and confirm that 100% of what I archived is in said archive by byte count.

                    Rather than me having to review the compressed tarball and looking through it for specific files or folders.

                    Essentially, I want to verify my archives before I offload them to cloud storage and find out (who knows how far down the line) that something was missed. (for whatever reason)

                    1 Reply Last reply Reply Quote 0
                    • DustinB3403D
                      DustinB3403
                      last edited by

                      I guess a more relevant way to have expressed my question would have been to ask:

                      How do you verify what is in your tarball before you offload it in a quick and efficient manner?

                      I want to trust, but verify (as this is a backup).

                      1 1 Reply Last reply Reply Quote 0
                      • 1
                        1337 @DustinB3403
                        last edited by 1337

                        @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

                        I guess a more relevant way to have expressed my question would have been to ask:

                        How do you verify what is in your tarball before you offload it in a quick and efficient manner?

                        I want to trust, but verify (as this is a backup).

                        Generally done with a hash of the files - sha-256, md5 or similar.

                        DustinB3403D 1 Reply Last reply Reply Quote 0
                        • DustinB3403D
                          DustinB3403 @1337
                          last edited by

                          @Pete-S said in Tar gzip file compression calculation without decompressing the file:

                          @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

                          I guess a more relevant way to have expressed my question would have been to ask:

                          How do you verify what is in your tarball before you offload it in a quick and efficient manner?

                          I want to trust, but verify (as this is a backup).

                          Generally done with a hash of the files - sha-256, md5 or similar.

                          And how would that work from my source of unpacked files?

                          1 1 Reply Last reply Reply Quote 1
                          • 1
                            1337 @DustinB3403
                            last edited by 1337

                            @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

                            @Pete-S said in Tar gzip file compression calculation without decompressing the file:

                            @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

                            I guess a more relevant way to have expressed my question would have been to ask:

                            How do you verify what is in your tarball before you offload it in a quick and efficient manner?

                            I want to trust, but verify (as this is a backup).

                            Generally done with a hash of the files - sha-256, md5 or similar.

                            And how would that work from my source of unpacked files?

                            What point in the chain from original file -> backup -> tar ball -> gzip -> offload to archive do you want to verify?

                            Is it safe to assume that the gzip file is correct when it is created?

                            DustinB3403D 1 Reply Last reply Reply Quote 0
                            • DustinB3403D
                              DustinB3403 @1337
                              last edited by

                              @Pete-S So the simplest way I can think to explain this would be like this.

                              You have a network share which is relatively organized

                              You create a compressed tarball of any folder on that share and then move that tarball to offsite storage.

                              How would I realistically get a hash of that folder pre and post tar and compression and have it make sense? They aren't the same thing, even if they contain the same things.

                              @Pete-S said in Tar gzip file compression calculation without decompressing the file:

                              Is it safe to assume that the gzip file is correct when it is created?

                              This is what I'm looking to verify 🙂

                              IRJI 1 2 Replies Last reply Reply Quote 0
                              • IRJI
                                IRJ @DustinB3403
                                last edited by

                                @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

                                @Pete-S So the simplest way I can think to explain this would be like this.

                                You have a network share which is relatively organized

                                You create a compressed tarball of any folder on that share and then move that tarball to offsite storage.

                                How would I realistically get a hash of that folder pre and post tar and compression and have it make sense? They aren't the same thing, even if they contain the same things.

                                @Pete-S said in Tar gzip file compression calculation without decompressing the file:

                                Is it safe to assume that the gzip file is correct when it is created?

                                This is what I'm looking to verify 🙂

                                Use a FIM like wazuh

                                1 Reply Last reply Reply Quote 0
                                • 1
                                  1337 @DustinB3403
                                  last edited by 1337

                                  @DustinB3403 said in Tar gzip file compression calculation without decompressing the file:

                                  @Pete-S So the simplest way I can think to explain this would be like this.

                                  You have a network share which is relatively organized

                                  You create a compressed tarball of any folder on that share and then move that tarball to offsite storage.

                                  How would I realistically get a hash of that folder pre and post tar and compression and have it make sense? They aren't the same thing, even if they contain the same things.

                                  @Pete-S said in Tar gzip file compression calculation without decompressing the file:

                                  Is it safe to assume that the gzip file is correct when it is created?

                                  This is what I'm looking to verify 🙂

                                  I'm assuming that files are static during backup.

                                  If you first of all run md5deep on all files in the folder, you'll create a textfile that contains md5 (or sha256 or what you want) signatures on every file in the folder. Place it into the folder so it ends up inside the backup and you'll always have the ability to verify any uncompressed individual file.

                                  If you really want to verify your tar.gz file after it's created I think you have to decompress the files to a temporary folder, run md5deep on the files to compare them with the original file. What you really are testing is that the backup-compress-decompress-restore operation is lossless on every file. It should be by design, but if there is an unlikely bug somewhere it's technically possible that it might not be.

                                  If you use the gzip compression with tar, gzip has a CRC-32 checksum inside that can be used to verify the integrity of the gzip file.

                                  Or to be even more certain you can create an md5 signature of the entire gzip archive with md5sum or md5deep. Then you can always verify that the archive has not been corrupted.

                                  If you ever need to restore the files you can verify the integrity of the restored files with the md5 you created on the original files, before you did the backup.

                                  1 Reply Last reply Reply Quote 1
                                  • 1 / 1
                                  • First post
                                    Last post