UNIX: What Is a Tarball



  • Spend any amount of time in UNIX circles and you will probably hear the term tarball used. Tarballs are the generated output files of the tar or tape archive utility. This utility can take many files or directories and create a single file, or tarball, out of them. In most circles in modern usage, it is also assumed that this resulting "tar" file will then be compressed as well, often using the gzip utility.

    In practical terms, a file will be tarred and gzipped and then called a tarball. The standard extension, if one is used, for a tarball of this nature is .tgz. (Individually .tar is used for uncompressed tar files and .gz for compressed normal files.)

    A tarball is a very general purpose file. It can be used for moving files from one place to another to speed file transfer times, can be used to send backups to tape, to create backup files, as a means of directly deploying software or preparing for further deployment in some other manner.

    A tarball is, for all intents and purposes, analogous on UNIX to a zip file on Windows. A zip file is both an archive and compression in a single utility. Modern tar commands have gzip and other compression mechanisms built in as well so only the single command is needed today, too. Another similar tool would be 7zip.

    Unpacking a tarball that is compressed is as simple as:

    tar -xzvf mytarball.tgz
    

    (In this case, the x flag is for unpack, z is for gunzip, v for verbose and f for "from file".)

    To create your own tarball from the /data directory you would do this:

    tar -czvf /data /tmp/mydatatarball.tgz
    

    (We exchange x for c, c standing for create.)

    Tarballs are a common file on UNIX systems; one that we will use often as system administrators.

    Reference:

    https://en.wikipedia.org/wiki/Tar_(computing)


    Part of a series on Linux Systems Administration by Scott Alan Miller



  • In my studies for the LPIC1, the material I am reading discusses both gzip and bzip2 in equal amounts when discussing tar. Yet, in the wild, I see mostly gzip being used. The material I have been studying mentions that the compression difference between gzip and bzip2 is unpredictable between files, that one file will compress better with gzip while another compresses better with bzip2.

    So, do you think the reason I am seeing a lot more gzip in use with tarballs is due to the familiarity of gzip and the negligible difference in the compression between it and bzip2? Basically, bzip2 doesn't make enough of an improvement with sufficient regularity to entice people to move away from gzip, or is there some other benefit to gzip that my training material hasn't covered?



  • @JJoyner1985 said in UNIX: What Is a Tarball:

    In my studies for the LPIC1, the material I am reading discusses both gzip and bzip2 in equal amounts when discussing tar. Yet, in the wild, I see mostly gzip being used. The material I have been studying mentions that the compression difference between gzip and bzip2 is unpredictable between files, that one file will compress better with gzip while another compresses better with bzip2.

    So, do you think the reason I am seeing a lot more gzip in use with tarballs is due to the familiarity of gzip and the negligible difference in the compression between it and bzip2? Basically, bzip2 doesn't make enough of an improvement with sufficient regularity to entice people to move away from gzip, or is there some other benefit to gzip that my training material hasn't covered?

    I think it may be speed... From what I've done with tar + gzip, it seems to be much faster than tar + bz2.



  • @JJoyner1985 said in UNIX: What Is a Tarball:

    In my studies for the LPIC1, the material I am reading discusses both gzip and bzip2 in equal amounts when discussing tar. Yet, in the wild, I see mostly gzip being used.

    bzip2 is great and if you are on your own systems, go crazy. But for transferring to other companies, everyone uses gzip as the universal standard.



  • @JJoyner1985 said in UNIX: What Is a Tarball:

    So, do you think the reason I am seeing a lot more gzip in use with tarballs is due to the familiarity of gzip and the negligible difference in the compression between it and bzip2? Basically, bzip2 doesn't make enough of an improvement with sufficient regularity to entice people to move away from gzip, or is there some other benefit to gzip that my training material hasn't covered?

    That's correct. The difference between the two is generally small enough that people are not concerned. And lots of systems still don't have bzip2 installed by default so if you want scripts or whatever to work universally you often use gzip because you know that it is always there and predictable.