Using a small ZFS recordsize doesn't save you space (well, almost never)

September 25, 2024

ZFS filesystems have a famously confusing 'recordsize' property, which in the past I've summarized as the maximum logical block size of a filesystem object. Sometimes I've seen people suggest that if you want to save disk space, you should reduce your 'recordsize' from the default 128 KBytes. This is almost invariably wrong; in fact, setting a low 'recordsize' is more likely to cost you space.

How a low recordsize costs you space is straightforward. In ZFS, every logical block requires its own DVA to point to it and contain its checksum. The more logical blocks you have, the more DVAs you require and the more space they take up. As you decrease the 'recordsize' of a filesystem, files (well, filesystem objects in general) that are larger than your recordsize will use more and more logical blocks for their data and have more and more DVAs, taking up more and more space.

In addition, ZFS compression operates on logical blocks and must save at least one disk block's worth of space to be considered worthwhile. If you have compression turned on (and if you care about space usage, you should), the closer your 'recordsize' gets to the vdev's disk block size, the harder it is for compression to save space. The limit case is when you make 'recordsize' be the same size as the disk block size, at which point ZFS compression can't do anything.

(This is the 'physical disk block size', or more exactly the vdev's 'ashift', which these days should basically always be 4 KBytes or greater, not the disk's 'logical block size', which is usually still 512 bytes.)

The one case where a large recordsize can theoretically cost you disk space is if you have large files that are mostly holes and you don't have any sort of compression turned on (which these days means specifically turning it off). If you have a (Unix) file that has 1 KByte of data every 128 KBytes and is otherwise not written to, without compression and with the default 128 KByte 'recordsize', you'll get a bunch of 128 KByte blocks that have 1 KByte of actual data and 127 KBytes of zeroes. If you reduced your "recordsize', you would still waste some space but more of it would be actual holes, with no space allocated. However, even the most minimal compression (a setting of 'compression=zle') will entirely eliminate this waste.

(The classical case of reducing 'recordsize' is helping databases out. More generally, you reduce 'recordsize' when you're rewriting data in place in small sizes (such as 4 KBytes or 16 KBytes) or appending data to a file in small sizes, because ZFS can only read and write entire logical blocks.)

PS: If you need a small 'recordsize' for performance, you shouldn't worry about the extra space usage, partly because you should also have a reasonable amount of free disk space to improve the performance of ZFS's space allocation.

Written on 25 September 2024.
« Go and my realization about what I'll call the 'Promises' pattern
The impact of the September 2024 CUPS CVEs depends on your size »

Page tools: View Source.
Search:
Login: Password:

Last modified: Wed Sep 25 21:54:47 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.