Wandering Thoughts archives


Why NFS writes to ZFS are sometimes (or often) slow

It's a relatively well known issue that writing lots of small files over NFS to a ZFS filesystem is slow, but I was surprised to discover that it had a significant slowdown even when doing large bulk streaming writes to single files. Discovering this got me curious enough to dig into things.

Like most recent filesystems, ZFS is a journaled, using what the ZFS people call the ZIL (ZFS Intent Log). Also like other journaled filesystems, ZFS has the fsync problem. So where do the syncs come from?

The first version of NFS required all writes to be synchronous, with the server not allowed to reply to them until the data was on disk, which was soon widely acknowledged as a terrible idea for performance. NFS v3 fixed this by allowing asynchronous writes and introducing a new operation, COMMIT, to force the server to flush some of your async writes to disk. If the server can't do this, for example because it has rebooted and lost some of your async writes, it will tell you and it's your obligation to resend the writes.

NFS v3 COMMITs are a form of fsync()s, and so they force ZFS to flush the ZIL, with the resulting performance hit. One of the times that NFS v3 clients send a COMMIT is when you close() a file, which is why writing lots of small files is slow on ZFS; there's an expensive sync after every file.

What is going on with large files is the corollary of async writes and COMMIT: if you have not COMMITed a range of writes, the server is free to lose them. Which means that you must be able to resend those writes, and thus have to keep the data sitting around in your writeback cache until you get a positive reply to your COMMIT. Thus, every so often the client has to send a COMMIT to the NFS server so that it can free up some of its writeback cache.

(Indeed, this is what I see when looking at NFS server stats; there are several hundred COMMITs over the course of writing a 10 GB file.)

All of this says nothing about whether the NFS write slowdown actually matters to you; that's something that depends on your usage patterns and what sort of performance you need. The performance I've measured in our test environment, while not stellar, is probably good enough for us.

solaris/SlowNFSWritesToZFS written at 23:22:18; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.