Some numbers for how well various compressors do with our
Recently I discussed how
gzip --best wasn't very fast when
compressing our Amanda (tar) backup of
and mentioned that we were trying out zstd
for this. As it happens, as part of our research on this issue I
ran one particular night's backup of our
/var/mail through all
of the various compressors to see how large they'd come out, and
I think the numbers are usefully illustrative.
The initial uncompressed tar archive is roughly 538 GB and is
probably almost completely ASCII text (since we use traditional
mbox format inboxes and most email is encoded to 7-bit ASCII). The
compression ratios are relative to the uncompressed file, while the
times are relative to the fastest compression algorithm. Byte sizes
were counted with '
wc -c', instead of writing the results to disk,
and I can be confident that the compression programs were the speed
limit on this system, not reading the initial tar archive off SSDs.
(The 'uncompressed' time is for '
cat <file> | wc -c'.)
On this very real-world test for us, zstd is clearly a winner
gzip; it achieves better compression with far less time.
gzip --fast takes about 32% less time than
gzip --best at only
a moderate cost in compression ratio, but it's not competitive with
zstd in either time or compression. Zstd is not as fast as lz4
but it's fast enough, while providing clearly better compression.
We're currently using the default zstd compression level, which
zstd -3' (we're just invoking plain '
numbers suggest that we'd lose very little compression from switching
zstd -1' but get a significant speed increase. At the moment
we're going to leave things as they are because our backups are now
fast enough (backing up
/var/mail is now not the limiting factor
on their overall speed) and we do get something for that extra time.
Also, it's simpler; because of how Amanda works, we'd need to add
a script to switch to '
(Amanda requires you to specify a program as your compressor, not a program plus arguments, so if you want to invoke the real compressor with some non-default options you need a cover script.)
Since someone is going to ask,
-fast got a compression ratio of 1.78 and a time ratio of 1.27.
This is extremely unrepresentative of what we could achieve in
production on our Amanda backup servers, since my test machine is
coreCPU Xeon Silver 4108. The parallelism speed
increase for pigz is not perfect, since it was only about 9.4 times
gzip --fast (which is single-core).
(Since I wanted to see the absolute best case for pigz in terms of
speed, I ran it on all
cores CPUs. I'm not interested
in doing more tests to establish how it scales when run with fewer
cores CPUs, since we're not going to use it; zstd
is better for our case.)
PS: I'm not giving absolute speeds because these speeds vary tremendously across our systems and also depend on what's being compressed, even with just ASCII text.