2018-05-02
An interaction of low ZFS recordsize
, compression, and advanced format disks
Suppose that you have something with a low ZFS recordsize
; a
classical example is zvols, where people often use an 8 Kb
volblocksize
. You have compression turned on, and you are using
a pool (or vdev) with ashift=12
because it's on 'advanced format' drives or you're preparing for that
possibility. This seems especially likely on SSDs, some of which
are already claiming to be 4K physical sector drives.
In this situation, you will probably get much lower compression ratios than you expect, even with reasonably compressible data. There are two reasons for this, the obvious one and the inobvious one. The obvious one is that ZFS compresses each logical block separately, and your logical blocks are small. Generally the larger the things you compress at once, the better most compression algorithms do, up to a reasonable size; if you use a small size, you get not as good results and less compression.
(The lz4
command line compression program doesn't even have an
option to compress in less than 64 Kb blocks (cf),
which shows you what people think of the idea. The lz4 algorithm
can be applied to smaller blocks, and ZFS does, but presumably the
results are not as good.)
The inobvious problem is how a small recordsize
interacts with a
large physical block size (ie, a large ashift
). In order to save
any space on disk, compression has to shrink the data enough so
that it uses fewer disk blocks. With 4 Kb disk blocks (an ashift
of 12), this means you need to compress things down by at least 4
Kb; when you're starting with 8 Kb logical blocks because of your
8 Kb recordsize
, this means you need at least 50% compression in
order to save any space at all. If your data is compressible but
not that compressible, you can't save any allocated space.
A larger recordsize
gives you more room to at least save some
space. With a 128 Kb recordsize
, you need only compress a bit (to
120 Kb, about 7% compression) in order to save one 4 Kb disk block.
Further increases in compression can get you more savings, bit by
bit, because you have more disk blocks to shave away.
(An ashift=9
pool similarly gives you more room to get wins from
compression because you can save space in 512 byte increments,
instead of needing to come up with 4 Kb of space savings at a time.)
(Writing this up as an entry was sparked by this ZFS lobste.rs discussion.)
PS: I believe that this implies that if your recordsize
(or
volblocksize
) is the same as the disk physical block size (or
ashift
size), compression will never do anything for you. I'm not
sure if ZFS will even try to run the compression code or if it will
silently pretend that you have compression=off
set.