Comparing gzip, bzip2, lzma
Most of you maybe never ever heard about lzma compression. It’s less popular than zip, gzip, or even bzip2. But how about 7-zip compression ?? yes, you may heard of it. It actually the famous implementation of lzma. After do some google, most of it says that lzma can reduce file size smaller than (even) bzip2. So it came to my curiosity comparing gzip2, bzip2, lzma.

Actually my will to compare lzma with gzip and bzip2 get stronger after my brother proved me that lzma can reduce file size smaller than bzip2. Well actually my brother only test it with 1 (one) pdf file, so I will prove it which can compress smaller. I will test it with text, image (png & svg), binary, movies and also mixed from all those files. And also I will use the best compression options.

The test machine is my sandbox machine. This is the spesification, and the :
debian etch 4.0 (2.6.18-6-xen-vserver-686)
2 x Intel Xeon Intel(R) Xeon(R) CPU 3040 @ 1.86GHz
1GB RAM (I have to shutdown all my domU to get 1GB back T_T )

The software version
gzip 1.3.5-15
bzip2 1.0.3-6
lzma 4.43-5

First Test (text files)

The first test is with a text file. For more dramatic I take the 2 files from my log files which has 656 MB and 60MB size. Well can you imagine how long the text will be with those size? :D . Ok here’s the test result :

$ du -hs text/
716M text/

$ /usr/bin/time -f "real %E,user %U,sys %S,CPU %P" tar -cv text/ | gzip -9 > text.tar.gz
real 0:25.31,user 0.00,sys 0.04,CPU 0%

$ /usr/bin/time -f "real %E, user %U, sys %S, CPU %P" tar -cv text/ | bzip2 -9 > text.tar.bz2
real 6:51.75,user 0.13,sys 0.89,CPU 0%

$ /usr/bin/time -f "real %E, user %U, sys %S, CPU %P" tar -cv text/ | lzma -kc9 > text.tar.bz2
real 24:28.16, user 0.08, sys 1.48,CPU 0%

The compression result :
-rw-r--r-- 1 yuda yuda 25M 2009-05-15 15:25 text.tar.bz2
-rw-r--r-- 1 yuda yuda 33M 2009-05-15 15:05 text.tar.gz
-rw-r--r-- 1 yuda yuda 17M 2009-05-15 15:52 text.tar.lzma

In the result it state that the CPU usage is 0% but actually when compressing using bzip2 and lzma the CPU usage up until 100%

Second Test (image files)

Second test is with image files (i.e. PNG and SVG files). The PNG files I took from crystal project and I delete all the text files. The SVG files I took from snowish SVG and also I delete all the text files.

PNG files :
$ du -hs crystal_project/
41M crystal_project/

$ /usr/bin/time -f "real %E,user %U,sys %S,CPU %P" tar -cv crystal_project/ | gzip -9 > crystal.tar.gz
real 0:02.63,user 0.01,sys 0.04,CPU 2%

$ /usr/bin/time -f "real %E,user %U,sys %S,CPU %P" tar -cv crystal_project/ | bzip2 -9 > crystal.tar.bz2
real 0:12.47,user 0.04,sys 0.14,CPU 1%

$ /usr/bin/time -f "real %E, user %U, sys %S,CPU %P" tar -cv crystal_project/ | lzma -9 > crystal.tar.lzma
real 0:26.56, user 0.04, sys 0.20,CPU 0%

The compression result :
-rw-r--r-- 1 yuda yuda 21M 2009-05-15 17:20 crystal.tar.bz2
-rw-r--r-- 1 yuda yuda 21M 2009-05-15 17:19 crystal.tar.gz
-rw-r--r-- 1 yuda yuda 18M 2009-05-15 17:22 crystal.tar.lzma

SVG files :
$ du -hs SnowIsh-1.0/
60M SnowIsh-1.0/

$ /usr/bin/time -f "real %E,user %U,sys %S,CPU %P" tar -cv SnowIsh-1.0/ | gzip -9 > snowish.tar.gz
real 0:06.91,user 0.00,sys 0.00,CPU 0%

$ /usr/bin/time -f "real %E,user %U,sys %S,CPU %P" tar -cv SnowIsh-1.0/ | bzip2 -9 > snowish.tar.bz2
real 0:18.34,user 0.00,sys 0.01,CPU 0%

$ /usr/bin/time -f "real %E, user %U, sys %S,CPU %P" tar -cv SnowIsh-1.0/ | lzma -9 > snowish.tar.lzma
real 1:33.06, user 0.00, sys 0.14,CPU 0%

The compression result :
-rw-r--r-- 1 yuda yuda 3.6M 2009-05-15 17:35 snowish.tar.bz2
-rw-r--r-- 1 yuda yuda 6.3M 2009-05-15 17:33 snowish.tar.gz
-rw-r--r-- 1 yuda yuda 665K 2009-05-15 17:38 snowish.tar.lzma

Third Test (binary files)

The binary files for this test I took from the ubuntu repository. I took only feisty distribution and main component, not all as my harddisk space also limited :D . Luckily, it still exist in my local repository.

$ du -hs ubuntu/
4.2G ubuntu/

$ /usr/bin/time -f "real %E,user %U,sys %S,CPU %P" tar -cv ubuntu/ | gzip -9 > ubuntu.tar.gz
real 9:09.62,user 0.15,sys 3.92,CPU 0%

$ /usr/bin/time -f "real %E,user %U,sys %S,CPU %P" tar -cv ubuntu/ | bzip2 -9 > ubuntu.tar.bz2
real 27:51.96,user 0.38,sys 6.46,CPU 0%

$ /usr/bin/time -f "real %E,user %U,sys %S,CPU %P" tar -cv ubuntu/ | lzma -9 > ubuntu.tar.lzma
real 1:01:11,user 0.68,sys 11.27,CPU 0%

The compression result :
-rw-r--r-- 1 yuda yuda 4.2G 2009-05-17 15:21 ubuntu.tar.bz2
-rw-r--r-- 1 yuda yuda 4.1G 2009-05-16 20:12 ubuntu.tar.gz
-rw-r--r-- 1 yuda yuda 4.1G 2009-05-17 16:33 ubuntu.tar.lzma

Well for the binary files test shows no dramatic reduce of filesize.

Fourth Test (movie files)

$ du -hs movie/
38M movie/

$ /usr/bin/time -f "real %E,user %U,sys %S,CPU %P" tar -cv movie/ | gzip -9 > movie.tar.gz
real 0:03.72,user 0.00,sys 0.00,CPU 0%

$ /usr/bin/time -f "real %E,user %U,sys %S,CPU %P" tar -cv movie/ | bzip2 -9 > movie.tar.bz
real 0:14.18,user 0.00,sys 0.05,CPU 0%

$ /usr/bin/time -f "real %E,user %U,sys %S,CPU %P" tar -cv movie/ | lzma -9 > movie.tar.lzma
real 0:07.68,user 0.00,sys 0.09,CPU 1%

The compression result :
-rw-r--r-- 1 yuda yuda 38M 2009-05-16 19:38 movie.tar.bz
-rw-r--r-- 1 yuda yuda 38M 2009-05-16 19:34 movie.tar.gz
-rw-r--r-- 1 yuda yuda 38M 2009-05-16 19:40 movie.tar.lzma

Well, as you can see gzip, bzip2, and lzma compression have no effect on movie files. The conclusion is I can’t compress my movie collections T_T *too bad*

Last Test (mixed files)

In this last test I use mixed up files (i.e. odt, doc, png, gif, svg, psd, iso, etc) :
$ du -hs mixed
8.8G mixed

$ /usr/bin/time -f "real %E, user %U, sys %S, CPU %P" tar -cv mixed/ | gzip -9 > mixed.tar.gz
real 22:38.13,user 2.60,sys 22.81,CPU 1%

$ /usr/bin/time -f "real %E,user %U,sys %S,CPU %P" tar -cv mixed/ | bzip2 -9 > mixed.tar.bz2
real 58:34.74,user 2.70,sys 22.25,CPU 0%

$ /usr/bin/time -f "real %E,user %U,sys %S,CPU %P" tar -cv mixed/ | lzma -9 > mixed.tar.lzma
real 2:47:29,user 3.56,sys 34.89,CPU 0%

The compression result :
-rw-r--r-- 1 yuda yuda 6.0G 2009-05-22 11:35 mixed.tar.bz2
-rw-r--r-- 1 yuda yuda 6.1G 2009-05-15 19:44 mixed.tar.gz
-rw-r--r-- 1 yuda yuda 5.3G 2009-05-22 17:14 mixed.tar.lzma

With these mixed files result, you can see it’s dramatic reduction of file size. It’s match for my backup purposes because I used to backup files from an old (and used) server :D

Actually in the first test series I got lack of RAM because it’s only 365MB (the rest used by my domU), so I have to shutdown all domU to get my full 1GB RAM back.

I hope this little test comparing gzip, bzip2, lzma can show which compression affect which kind of files effectively. In the result I confused by the result shown above, the CPU precentage shows 0% but actually for bzip2 and lzma the CPU usage reach until 100%.

Photo credit goes to www.cirris.com

inspirations :

http://martin.ankerl.com/2006/06/22/lzma-compression-in-linux/

http://odzangba.wordpress.com/2009/03/25/gzip-vs-bzip2-vs-lzma/

http://tukaani.org/lzma/benchmarks

& Komentar

  • At 2009.06.27 07:45, Okto Silaban said:

    Owww.. now you’re blogging in English huh..
    Greatz.. Congratz buddy… :D

    • At 2009.06.28 18:31, Yuda said:

      thanx ;)

    • At 2009.11.29 23:32, Meti said:

      request…. lzop :D

      (Required)
      (Required, will not be published)