Wandering Thoughts archives

2024-06-23

Some notes on ZFS's zstd compression kstats (on Linux)

Like various other filesystems, (Open)ZFS can compress file data when it gets written. As covered in the documentation for the 'compression=' filesystem property, ZFS can use a number of compression algorithms for this, including Zstd. The zstd compression system in ZFS exposes some information about its activity in the form of ZFS kstats; on Linux, these are visible in /proc/spl/kstat/zfs/zstd, but are unfortunately a little underdocumented. For reasons beyond the scope of this blog entry I recently looked into this, so here is what I know about them from reading the code.

compress_level_invalid
The zstd compression level was invalid on compression.
compress_alloc_fail
We failed to allocate a zstd compression context.
compress_failed
We failed to compress a block (after allocating a compression context).

decompress_level_invalid
A compressed block had an invalid zstd compression level.
decompress_header_invalid
A compressed block had an invalid zstd header.
decompress_alloc_fail
We failed to allocate a zstd decompression context. This should not normally happen.
decompress_failed
A compressed block with a valid header failed to decode (after we allocated a decompression context).

The zstd code does some memory allocation for data buffers and contexts and so on. These have more granular information in the kstats:

alloc_fail
How many times allocation failed for either compression or decompression.
alloc_fallback
How many times decompression allocation had to fall back to a special emergency reserve in order to allow blocks to be decompressed. A fallback is also considered an allocation failure.

buffers
How many buffers zstd has in its internal memory pools. I don't understand what 'buffers' are in this context, but I think they're objects (such as contexts or data buffers), not pool arenas.
size
The total size of all buffers currently in zstd's internal memory pools.

If I'm reading the code correctly, compression and decompression have separate pools, and each of them can have up to 16 buffers in them. Buffers are only freed if they're old enough, and if the zstd code needs more buffers than this for various possible reasons, it allocates them directly outside of the memory pools. No current kstat tracks how many allocations happened outside of the memory pools (or how effective the pools are), although this information could be extracted with eBPF.

In the current version of OpenZFS, compressing things with zstd has a complex system of trying to figure out if things are compressible (a 'tiered early abort'). If the data is small enough (128 Kb or less), ZFS just does the zstd compression (and then a higher level will discard the result if it didn't save enough). Otherwise, if you're using a high enough zstd level, ZFS tries a quick check with LZ4 and then zstd-1 to see if it can bail out quickly rather than try to compress the entire thing with zstd only to throw it away. How this goes is shown in some kstats:

passignored
How many times zstd didn't try the complex approach.
passignored_size
The amount of small data processed without extensive checks.
lz4pass_allowed
How many times the LZ4 pre-check passed things.
lz4pass_rejected
How many times the LZ4 pre-check rejected things.
zstdpass_allowed
How many times the quick zstd pre-check passed things.
zstdpass_rejected
How many times the quick zstd pre-check rejected things.

A number of these things will routinely be zero. Obviously you would like to see no compression and decompression failures, no allocation failures, and so on. And if you're compressing with only zstd-1 or zstd-2, you'll never trigger the complicated pre-checks; however, I believe that zstd-3 will trigger this, and that's the normal default if you set 'compression=zstd'.

This particular sausage is all made in module/zstd/zfs_zstd.c, which has some comments about things.

linux/ZFSZstdCompressionKstats written at 22:38:34;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.