Posted October 8, 2009 by Spyros in Linux/Unix Programming

How to Use Tar, Gzip and Bzip2 Under The Unix Shell


When moving to linux or other unix based systems, it is a standard that new users have a problem using tar, gzip and bzip2 in order to compress their archives. I remember that back at the days i first used a linux system, i had lots of trouble using these commands.

Luckily, nowadays, most linux systems and especially ubuntu like ones have their own embedded archivers that will automate the work of extracting and creating such archives for you. However, as a linux user, you really have to utilize the power of the shell that will make your everyday life easier and allow you to do things faster and more effectively.

In this post, i will show you how to use those 3 programs in order to compress your data. I will try to provide you with easy ways to not only find out the switches and how they are used, but ways to actually remember what you need to write at the shell the next time you would want to compress or extract something. There are some tricks that i use in order to not forget of the various switches that tar, gzip and bzip2 use and i will be presenting them here.

What is tar, bzip2 and gzip ?

There is no real need to go too deep on the history of the tools and stuff like that. What you need to know is 3 things :

1. In short, tar is a program that gets various files as input and produces just one file that contains them all. Therefore, you may have like 100 files and create just one tar file that will hold each one of them. Remember that tar does NOT use any compression by itself and therefore the resulting file will have the exact same size as the size that the files it consists of have.

2. Gzip and Bzip2 are two different programs that actually do the same work and that is compress files. Most usually, for the sake of convenience, we first create a tar file that contains all the files we want to compress and then pass that to one of these compressors. This is why you see files named like “archive.tar.gz” or “archive.tar.bz2”. The first is a tar file (that may contain hundreds of files inside), that is compressed with gzip, while the latter is a tar file that is compressed with bzip2.

3. In order to decide which of the two compression programs to use, remember this simple thing :

Bzip2 produces bout 15% smaller zipped files but Gzip compresses faster than Bzip2

How to Use Tar to Create and Extract an Archive ?

Creating a tar file is pretty easy. The first thing you need to do is decide what files you want to add to the tar file. Let’s suppose that you want to include just two files named file1 and file2 that are under your desktop. In order to compress them, you can first change to your Desktop directory (using the command cd ~/Desktop) and then execute :

tar cfv new.tar file1 file2

This will create a file named new.tar that holds the two files. In order to remember the switches that i use, the best idea is to understand what they are about and use simple words to remember them :

-c means Create. This asks from tar that it creates an archiver for us.

-v means Verbose and is actually not needed, but what it does is provide us with information about what files are compressed. It’s a good idea to use it.

-f means Force and we use it so that tar does not ask us whether we are sure we want to compress these files or not.

Also, notice that we first specify the destination archive and then the sources. Tar, bzip and gzip are of those few commands, where unlike mv or cp, the destination is specified first.

After you create the archiver, decompressing it is also very easy and it is actually a simple change of the switches :

tar xfv new.tar

The only real difference is the switch -x which comes from the word eXtract and it is therefore easy to remember. The latter command will extract the files of the new.tar archive in the current directory.

How to Use Gzip and Bzip2 to Create and Extract Tar Archives

Now comes the really important part. Creating and extracting archives zipped with gzip or bzip2. Don’t let that intimidate you. The commands that we use in order to do these things are also very easy to remember. In order to create a new tar archive of the two files file1 and file2 and also compress that with gzip, we execute the command :

tar cfvz new.tar.gz file1 file2

This will create a new tar file compressed with gzip and containing the files named file1 and file2. As you can see, the switches cfv are already known to you and they mean, create a file without asking me and provide me with information about the operation. The new switch is -z and it actually instructs tar to create a file compressed with gzip. On the other hand, to create the same tar file but compressed with bzip2 you can specify this command :

tar cfvj new.tar.bz2 file1 file2

The only thing that changes is that the -z for gzip becomes -j. So now, you ask yourself, how will i remember the difference ? I have a way that i do it personally. In order to remember what -z stands for, i think of the .gz extension. The latter ends with a z and this is how i remember it. There bz2 will use -j.

Extracting the archived files is very easy and it is actually the same for both compressed files like :

tar xfv new.tar.bz2

tar xfv new.tar.gz

As we have already discussed, xfv means something like extract with force the archiver files, providing us with information about the operation. Before i conclude this tutorial, i will just let you know of another important switch and that is “-C filepath”.  If you do that, it is like specifying the directory where the files you want to compress are if they are not under the current directory that you are at. Therefore, in order not to have to change to the directory that your files are at, you just specify their path using that switch.