Some numbers for how well various compressors do with our /var/mail backup

April 7, 2018

Recently I discussed how gzip --best wasn't very fast when compressing our Amanda (tar) backup of /var/mail, and mentioned that we were trying out zstd for this. As it happens, as part of our research on this issue I ran one particular night's backup of our /var/mail through all of the various compressors to see how large they'd come out, and I think the numbers are usefully illustrative.

The initial uncompressed tar archive is roughly 538 GB and is probably almost completely ASCII text (since we use traditional mbox format inboxes and most email is encoded to 7-bit ASCII). The compression ratios are relative to the uncompressed file, while the times are relative to the fastest compression algorithm. Byte sizes were counted with 'wc -c', instead of writing the results to disk, and I can be confident that the compression programs were the speed limit on this system, not reading the initial tar archive off SSDs.

Compression ratio Time ratio
uncompressed 1.0 0.47
lz4 1.4 1.0
gzip --fast 1.77 11.9
gzip --best 1.87 17.5
zstd -1 1.92 1.7
zstd -3 1.99 2.4

(The 'uncompressed' time is for 'cat <file> | wc -c'.)

On this very real-world test for us, zstd is clearly a winner over gzip; it achieves better compression with far less time. gzip --fast takes about 32% less time than gzip --best at only a moderate cost in compression ratio, but it's not competitive with zstd in either time or compression. Zstd is not as fast as lz4 but it's fast enough, while providing clearly better compression.

We're currently using the default zstd compression level, which is 'zstd -3' (we're just invoking plain '/usr/bin/zstd'). These numbers suggest that we'd lose very little compression from switching to 'zstd -1' but get a significant speed increase. At the moment we're going to leave things as they are because our backups are now fast enough (backing up /var/mail is now not the limiting factor on their overall speed) and we do get something for that extra time. Also, it's simpler; because of how Amanda works, we'd need to add a script to switch to 'zstd -1'.

(Amanda requires you to specify a program as your compressor, not a program plus arguments, so if you want to invoke the real compressor with some non-default options you need a cover script.)

Since someone is going to ask, pigz -fast got a compression ratio of 1.78 and a time ratio of 1.27. This is extremely unrepresentative of what we could achieve in production on our Amanda backup servers, since my test machine is a 16-coreCPU Xeon Silver 4108. The parallelism speed increase for pigz is not perfect, since it was only about 9.4 times faster than gzip --fast (which is single-core).

(Since I wanted to see the absolute best case for pigz in terms of speed, I ran it on all cores CPUs. I'm not interested in doing more tests to establish how it scales when run with fewer cores CPUs, since we're not going to use it; zstd is better for our case.)

PS: I'm not giving absolute speeds because these speeds vary tremendously across our systems and also depend on what's being compressed, even with just ASCII text.


Comments on this page:

By Zev Weiss at 2018-04-07 02:36:47:

For what likely-academic interest it may be worth, ark.intel.com shows the Xeon Silver 4108 as having 16 threads rather than 16 cores (of which it has 8) -- I don't know offhand whether gzip is the kind of thing that scales well or poorly with SMT (the threads-vs-cores distinction matters more to some workloads than others), but it seems possible that the sub-linear scaling may be more the hardware's "fault" than gzip's. (Unless you meant that the test hardware was a dual-socket machine, in which case nevermind.)

By cks at 2018-04-07 03:08:28:

You're right, and I've been sloppy in my terminology here. I knew this was an 8C/16T machine, but I called them 'cores' despite that. Looking at how Linux names things in this area (in eg 'lscpu -e'), I think I should call them 'CPUs' as the best generic term, and I've now altered the entry to match.

(At the same time using 'CPUs' here just sounds awkward, and it risks confusion with the socket count. Maybe there's a better consensus term that's emerged. I don't want to use 'threads'; even if it's technically accurate, it's just too easily confused with threads in the software sense.)

By Twirrim at 2018-04-09 21:40:22:

It's extremely unlikely to be suitable for the purpose (and not what it was designed for at all), but if you pass pigz -11 it'll use Zopfli to compress the content. https://github.com/google/zopfli. Odds are it'll take longer than "gzip --best", but depending on the content it can reap serious dividends. I've fiddled with it from time to time on various tasks and seen advantages where I wasn't quite so concerned about compression time.

Zopfli only has a very narrow use case. It’s very useful on the web because it’s compatible with gzip decompression, which is pervasively deployed in browsers – so by investing CPU cycles you get to save a little bit of bandwidth (which adds up in aggregate), all without having to wait for clients to upgrade. But in scenarios like Chris’s where gzip decompression compatibility is not a requirement, Zopfli is quite simply pointless: it burns an enormous amount of CPU cycles without achieving anywhere near the compression ratio of comparatively costly algorithms such as xz.

By Nick at 2018-04-12 14:15:17:

Zstd also has multithreaded compression enabled with `zstd -T0`, which auto-detects the number of cores.

Written on 07 April 2018.
« Using Go finalizers can be a better option than not using them
A learning experience with iOS's fingerprint recognition »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Apr 7 01:13:23 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.