NFS writes and whether or not they're synchronous

June 16, 2015

In the original NFS v2, the situation with writes was relatively simple. The protocol specified that the server could only acknowledge write operations when it had committed them to disk, both for file data writes and for metadata operations such as creating files and directories, renaming files, and so on. Clients were free to buffer writes locally before sending them to the server and generally did, just as they buffered writes before sending them to local disks. As usual, when a client program did a sync() or a fsync(), this caused the client kernel to flush any locally buffered writes to the server, which would then commit them to disk and acknowledge them.

(You could sometimes tell clients not to do any local buffering and to immediately send all writes to the server, which theoretically resulted in no buffering anywhere.)

This worked and was simple (a big virtue in early NFS), but didn't really go very fast under a lot of circumstances. NFS server vendors did various things to speed writes up, from battery backed RAM on special cards to simply allowing the server to lie to clients about their data being on disk (which results in silent data loss if the server then loses that data, eg due to a power failure or abrupt reboot).

In NFS v3 the protocol was revised to add asynchronous writes and a new operation, COMMIT, to force the server to really flush your submitted asynchronous writes to disk. A NFS v3 server is permitted to lose submitted asynchronous writes up until you issue a successful COMMIT operation; this implies that the client must hang on to a copy of the written data so that it can resend it if needed. Of course, the server can start writing your data earlier if it wants to; it's up to the server. In addition clients can specify that their writes are synchronous, reverting NFS v3 back to the v2 behavior.

(See RFC 1813 for the gory details. It's actually surprisingly readable.)

In the simple case the client kernel will send a single COMMIT at the end of writing the file (for example, when your program closes it or fsync()s it). But if your program writes a large enough file, the client kernel won't want to buffer all of it in memory and so will start sending COMMIT operations to the server every so often so it can free up some of those write buffers. This can cause unexpected slowdowns under some circumstances, depending on a lot of factors.

(Note that just as with other forms of writeback disk IO, the client kernel may do these COMMITs asynchronously from your program's activity. Or it may opt to not try to be that clever and just force a synchronous COMMIT pause on your program every so often. There are arguments either way.)

If you write NFS v3 file data synchronously on the client, either by using O_SYNC or by appropriate NFS mount options, the client will not just immediately send it to the server without local buffering (the way it did in NFS v2), it will also insist that the server write it to disk synchronously. This means that forced synchronous client IO in NFS v3 causes a bigger change in performance than in NFS v2; basically you reduce NFS v3 down to NFS v2 end to end synchronous writes. You're not just eliminating client buffering, you're eliminating all buffering and increasing how many IOPs the server must do (well, compared to normal NFS v3 write IO).

All of this is just for file data writes. NFS v3 metadata operations are still just as synchronous as they were in NFS v2, so things like 'rm -rf' on a big source tree are just as slow as they used to be.

(I don't know enough about NFS v4 to know how it handles synchronous and asynchronous writes.)

Written on 16 June 2015.
« My view of NFS protocol versions
Exploring the irritating thing about Python's .join() »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jun 16 00:44:42 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.