Pigz is basically parallel gzip, to take advantage of multiple cores. When you’ve got massive files, this can be a pretty big advantage, especially when you’ve got lots of cores sitting around.
Taking a 418m squid access log file, on a dual-quad Nehalem L5520 with HyperThreading turned on:
[jallspaw@server01 ~]$ ls -lh daemon.log.2; time gzip ./daemon.log.2 ; ls -lh ./daemon.log.2.gz
-rw-r—– 1 jallspaw jallspaw 418M Apr 2 19:18 daemon.log.2
real 0m12.398s
user 0m12.107s
sys 0m0.288s
-rw-r—– 1 jallspaw jallspaw 45M Apr 2 19:18 ./daemon.log.2.gz
…now gunziping it:
[jallspaw@server01 ~]$ ls -lh daemon.log.2.gz; time gunzip ./daemon.log.2 ; ls -lh ./daemon.log.2
-rw-r—– 1 jallspaw jallspaw 45M Apr 2 19:18 daemon.log.2.gz
real 0m3.245s
user 0m2.693s
sys 0m0.552s
-rw-r—– 1 jallspaw jallspaw 418M Apr 2 19:18 ./daemon.log.2
htop looks like this when this is happening:
(Note the freeloading/lazy 15 cores sitting around watching its friend core #10 sweating)
…now pigz’ing it:
[jallspaw@server01 ~]$ ls -lh daemon.log.2; time ./pigz-2.1.6/pigz ./daemon.log.2 ; ls -lh ./daemon.log.2.gz
-rw-r—– 1 jallspaw jallspaw 418M Apr 2 19:18 daemon.log.2
real 0m1.569s
user 0m23.092s
sys 0m0.422s
-rw-r—– 1 jallspaw jallspaw 45M Apr 2 19:18 ./daemon.log.2.gz
…now unpigz’ing it:
[jallspaw@server01 ~]$ ls -lh daemon.log.2.gz; time ./pigz-2.1.6/unpigz ./daemon.log.2.gz ; ls -lh ./daemon.log.2
-rw-r—– 1 jallspaw jallspaw 45M Apr 2 19:18 daemon.log.2.gz
real 0m1.456s
user 0m1.861s
sys 0m0.867s
-rw-r—– 1 jallspaw jallspaw 418M Apr 2 19:18 ./daemon.log.2
and htop looks like this when it’s happening:
which do you like better?