ZFS pushes file renamings and other metadata changes to disk quite promptly
One of the general open questions on Unix is when changes like
renaming or creating files are actually durably on disk. Famously,
some filesystems on some Unixes have been willing to delay this for
an unpredictable amount of time unless you did things like fsync()
the containing directory of your renamed file, not just fsync()
the file itself. As it happens, ZFS's design means that it offers
some surprisingly strong guarantees about this; specifically, ZFS
persists all metadata changes to disk no later than the next
transaction group commit. In ZFS today, a transaction group commit
generally happens every five seconds, so if you do something like
rename a file, your rename will be fully durable quite soon even if
you do nothing special.
However, this doesn't mean that if you create a file, write data
to the file, and then rename it (with no other special operations)
that in five or ten seconds your new file is guaranteed to be present
under its new name with all the data you wrote. Although metadata
operations like creating and renaming files go to ZFS right away
and then become part of the next txg commit, the kernel generally
holds on to written file data for a while before pushing it out.
You need some sort of fsync()
in there to force the kernel to
commit your data, not just your file creation and renaming. Because
of how the ZFS intent log works, you don't need
to do anything more than fsync()
your file here; when you fsync()
a file, all pending metadata changes are flushed out to disk along
with the file data.
(In a 'create new version, write, rename to overwrite current
version' setup, I think you want to fsync()
the file twice, once
after the write and then once after the rename. Otherwise you haven't
necessarily forced the rename itself to be written out. You don't
want to do the rename before a fsync()
, because then I think that
a crash at just the wrong time could give you an empty new file.
But the ice is thin here in portable code, including code that wants
to be portable to different filesystem types.)
My impression is that ZFS is one of the few filesystems with such a regular schedule for committing metadata changes to disk. Others may be much more unpredictable, and possibly may reorder the commits of some metadata operations in the process (although by now, it would be nice if everyone avoided that particular trick). In ZFS, not only do metadata changes commit regularly, but there is a strict time order to them such that they can never cross over each other that way.
|
|