Using a small ZFS recordsize doesn't save you space (well, almost never)
ZFS filesystems have a famously confusing 'recordsize
' property,
which in the past I've summarized as the maximum logical block
size of a filesystem object. Sometimes I've
seen people suggest that if you want to save disk space, you should
reduce your 'recordsize
' from the default 128 KBytes. This is
almost invariably wrong; in fact, setting a low 'recordsize
' is
more likely to cost you space.
How a low recordsize costs you space is straightforward. In ZFS,
every logical block requires its own DVA
to point to it and contain its checksum.
The more logical blocks you have, the more DVAs you require and the
more space they take up. As you decrease the 'recordsize
' of a
filesystem, files (well, filesystem objects in general) that are
larger than your recordsize will use more and more logical blocks
for their data and have more and more DVAs, taking up more and more
space.
In addition, ZFS compression operates on logical blocks and must save at least one disk block's
worth of space to be considered worthwhile. If you have compression
turned on (and if you care about space usage, you should), the
closer your 'recordsize
' gets to the vdev's disk block size, the
harder it is for compression to save space. The limit case is when
you make 'recordsize
' be the same size as the disk block size, at
which point ZFS compression can't do anything.
(This is the 'physical disk block size', or more exactly the vdev's 'ashift', which these days should basically always be 4 KBytes or greater, not the disk's 'logical block size', which is usually still 512 bytes.)
The one case where a large recordsize can theoretically cost you
disk space is if you have large files that are mostly holes and you
don't have any sort of compression turned on (which these days
means specifically turning it off). If
you have a (Unix) file that has 1 KByte of data every 128 KBytes
and is otherwise not written to, without compression and with the
default 128 KByte 'recordsize
', you'll get a bunch of 128 KByte
blocks that have 1 KByte of actual data and 127 KBytes of zeroes.
If you reduced your "recordsize
', you would still waste some space
but more of it would be actual holes, with no space allocated.
However, even the most minimal compression (a setting of
'compression=zle')
will entirely eliminate this waste.
(The classical case of reducing 'recordsize
' is helping databases
out. More generally, you reduce 'recordsize
'
when you're rewriting data in place in small sizes (such as 4 KBytes
or 16 KBytes) or appending data to a file in small sizes, because
ZFS can only read and write entire logical blocks.)
PS: If you need a small 'recordsize
' for performance, you shouldn't
worry about the extra space usage, partly because you should also
have a reasonable amount of free disk space to improve the performance
of ZFS's space allocation.
|
|