== A surprise in how ZFS grows a file's record size (at least for me) As [[I wound up experimentally verifying ZFSFileRecordsizeGrowth]], in ZFS all files are stored as a single block of varying size up to the filesystem's _recordsize_, or using multiple _recordsize_ blocks. If a file has more than one block, all blocks are _recordsize_, no more and no less. If a file is a single block, the size of this block is based on how much data has been written to the file (or technically the maximum offset that's been written to the file). However, how the block size grows as you write data to the file turns out to be somewhat surprising (which makes me very glad that I actually did some experiments to verify what I thought I knew before I wrote this entry, because I was very wrong). Rather than involving the _ashift_ or growing in powers of two, ~~ZFS always grows the (logical) block size in 512-byte chunks~~ until it reaches the filesystem _recordsize_. The actual physical space allocated on disk is in _ashift_ sized units, as you'd expect, but this is not directly related to the (logical) block size used at the file level. For example, here is a 16896 byte file (of incompressible data) on an _ashift=12_ pool: .pn prewrap on Object lvl iblk dblk dsize dnsize lsize %full type 4780566 1 128K 16.5K 20K 512 16.5K 100.00 ZFS plain file [...] 0 L0 DVA[0]=<0:444bbc000:5000> [L0 ZFS plain file] [...] size=4200L/4200P [...] The [[DVA ZFSDVAOffsetVdevDetails]] records an 0x5000 byte allocation (20 Kb), but the logical and physical-logical size are only 0x4200 bytes (16.5 Kb). In thinking about it, this makes a certain amount of sense because the _ashift_ is really a vdev property, not a pool property, and can vary from vdev to vdev within a single pool. As a result, the actual allocated size of a given block may vary from vdev to vdev (and a block may be written to multiple vdevs if you have _copies_ set to more than 1 or it's metadata). The file's current block size thus can't be based on the _ashift_, because ZFS doesn't necessarily have a single _ashift_ to base it on; instead ZFS bases it on 512-byte sectors, even if this has to be materialized differently on different vdevs. Looking back, I've already sort of seen this with [[ZFS compression ZFSFilePartialAndHoleStorage]]. As you'd expect, a file's (logical) block size is based on its uncompressed size, or more exactly on the highest byte offset in the file. You can write something to disk that compresses extremely well, and it will still have a large logical block size. Here's an extreme case: ; dd if=/dev/zero of=testfile bs=128k count=1 [...] # zdb -vv -bbbb -O ssddata/homes cks/tmp/testfile Object lvl iblk dblk dsize dnsize lsize %full type 956361 1 128K 128K 0 512 128K 0.00 ZFS plain file [...] This turns out to have no data blocks allocated at all, because the 128 Kb of zeros can be recorded entirely in magic flags in the dnode. But it still has a 128 Kb logical block size. 128 Kb of the character 'a' does wind up requiring a DVA allocation, but the size difference is drastic: Object lvl iblk dblk dsize dnsize lsize %full type 956029 1 128K 128K 1K 512 128K 100.00 ZFS plain file [...] 0 L0 DVA[0]=<0:3bbd1c00:400> [L0 ZFS plain file] [...] size=20000L/400P [...] We have a compressed size of 1 Kb (and a 1 Kb allocation on disk, as this is an _ashift=9_ vdev), but once again the file block size is 128 Kb. (If we wrote 127.5 Kb of 'a' instead, we'd wind up with a file block size of 127.5 Kb. I'll let interested parties do that experiment themselves.) What this means is that ZFS has much less wasted space than I thought it did for files that are under the _recordsize_. Since such files grow their logical block size in 512-byte chunks, even with no compression they waste at most almost all of one physical block on disk (if you have a file that is, say, 32 Kb plus one byte, you'll have a physical block on disk with only one byte used). This has some implications for other areas of ZFS, but those are for another entry. (This is one of those entries that I'm really glad that I decided to write. I set out to write it as a prequel to another entry just to have how ZFS grew the block size of files written down explicitly, but wound up upending my understanding of the whole area. The other lesson for me is that verifying my understanding with experiments is a really good idea, because every so often my folk understanding is drastically wrong.)