2015-06-16
NFS writes and whether or not they're synchronous
In the original NFS v2, the situation with
writes was relatively simple. The protocol specified that the server
could only acknowledge write operations when it had committed them
to disk, both for file data writes and for metadata operations such
as creating files and directories, renaming files, and so on.
Clients were free to buffer writes locally before sending them to
the server and generally did, just as they buffered writes before
sending them to local disks. As usual, when a client program did
a sync()
or a fsync()
, this caused the client kernel to flush
any locally buffered writes to the server, which would then commit
them to disk and acknowledge them.
(You could sometimes tell clients not to do any local buffering and to immediately send all writes to the server, which theoretically resulted in no buffering anywhere.)
This worked and was simple (a big virtue in early NFS), but didn't really go very fast under a lot of circumstances. NFS server vendors did various things to speed writes up, from battery backed RAM on special cards to simply allowing the server to lie to clients about their data being on disk (which results in silent data loss if the server then loses that data, eg due to a power failure or abrupt reboot).
In NFS v3 the protocol was revised to add asynchronous writes and
a new operation, COMMIT
, to force the server to really flush your
submitted asynchronous writes to disk. A NFS v3 server is permitted
to lose submitted asynchronous writes up until you issue a successful
COMMIT
operation; this implies that the client must hang on to a
copy of the written data so that it can resend it if needed. Of
course, the server can start writing your data earlier if it wants
to; it's up to the server. In addition clients can specify that
their writes are synchronous, reverting NFS v3 back to the v2
behavior.
(See RFC 1813 for the gory details. It's actually surprisingly readable.)
In the simple case the client kernel will send a single COMMIT
at the end of writing the file (for example, when your program
closes it or fsync()
s it). But if your program writes a large
enough file, the client kernel won't want to buffer all of it in
memory and so will start sending COMMIT
operations to the server
every so often so it can free up some of those write buffers. This
can cause unexpected slowdowns under some circumstances, depending on a lot of factors.
(Note that just as with other forms of writeback disk IO, the client
kernel may do these COMMIT
s asynchronously from your program's
activity. Or it may opt to not try to be that clever and just force
a synchronous COMMIT
pause on your program every so often. There
are arguments either way.)
If you write NFS v3 file data synchronously on the client, either
by using O_SYNC
or by appropriate NFS mount options, the client
will not just immediately send it to the server without local
buffering (the way it did in NFS v2), it will also insist that the
server write it to disk synchronously. This means that forced
synchronous client IO in NFS v3 causes a bigger change in performance
than in NFS v2; basically you reduce NFS v3 down to NFS v2 end to
end synchronous writes. You're not just eliminating client buffering,
you're eliminating all buffering and increasing how many IOPs the
server must do (well, compared to normal NFS v3 write IO).
All of this is just for file data writes. NFS v3 metadata operations
are still just as synchronous as they were in NFS v2, so things
like 'rm -rf
' on a big source tree are just as slow as they used
to be.
(I don't know enough about NFS v4 to know how it handles synchronous and asynchronous writes.)