An interesting yet ordinary consequence of ZFS using the ZIL
On the Fediverse, Alan Coopersmith recently shared this:
@bsmaalders @cks writing a temp file and renaming it also avoids the failure-to-truncate issues found in screenshot cropping tools recently (#aCropalypse), but as some folks at work recently discovered, you need to be sure to fsync() before the rename, or a failure at the wrong moment can leave you with a zero-length file instead of the old one as the directory metadata can get written before the file contents data on ZFS.
On the one hand, this is perfectly ordinary behavior for a modern filesystem; often renames are synchronous and durable, but if you create a file, write it, and then rename it to something else, you haven't insured that the data you wrote is on disk, just that the renaming is. On the other hand, as someone who's somewhat immersed in ZFS this initially felt surprising to me, because ZFS is one of the rare filesystems that enforces a strict temporal order on all IO operations in its core IO model of ZFS transaction groups.
How this works is that everything that happens in a ZFS filesystem goes into a transaction group (TXG). At any give time there's only one open TXG and TXGs commit in order, so if B is issued after A, either it's in the same TXG as A the two happen together or it's in a TXG after A and so A has already happened. In transaction groups, you can never have B happen but A not happen. In the TXG mental model of ZFS IO, this data loss is impossible, since the rename happened after the data write.
However, all of this strict TXG ordering goes out the window once
you introduce the ZFS Intent Log (ZIL), because
the ZIL's entire purpose is to persist selected operations to disk
before they're committed as part of a transaction group. Renames
and file creations always go in the ZIL (along with various other
metadata operations), but file data only goes in the ZIL if you
fsync() it (this is a slight simplification, and file data
isn't necessarily directly in the ZIL).
So once the ZIL was in my mental model I could understand what
had happened. In
effect the presence of the ZIL had changed ZFS from a filesystem
with very strong data ordering properties to one with more ordinary
ones, and in such a more ordinary filesystem you do need to
your newly written file data to make it durable.
(And under normal circumstances ZFS always has the ZIL, so I was engaging in a bit of skewed system programmer thinking.)