Partly getting around NFS's concurrent write problem

April 17, 2014

In a comment on my entry about NFS's problem with concurrent writes, a commentator asked this very good question:

So if A writes a file to an NFS directory and B needs to read it "immediately" as the file appears, is the only workaround to use low values of actimeo? Or should A and B be communicating directly with some simple mechanism instead of setting, say, actimeo=1?

(Let's assume that we've got 'close to open' consistency to start with, where A fully writes the file before B processes it.)

If I was faced with this problem and I had a free hand with A and B, I would make A create the file with some non-repeating name and then send an explicit message to B with 'look at file <X>' (using eg a TCP connection between the two). A should probably fsync() the file before it sends this message to make sure that the file's on the server. The goal of this approach is to avoid B's kernel having any cached information about whether or not file <X> might exist (or what the contents of the directory are). With no cached information, B's kernel must go ask the NFS fileserver and thus get accurate information back. I'd want to test this with my actual NFS server and client just to be sure (actual NFS implementations can be endlessly crazy) but I'd expect it to work reliably.

Note that it's important to not reuse filenames. If A ever reuses a filename, B's kernel may have stale information about the old version of the file cached; at the best this will get B a stale filehandle error and at the worst B will read old information from the old version of the file.

If you can't communicate between A and B directly and B operates by scanning the directory to look for new files, you have a moderate caching problem. B's kernel will normally cache information about the contents of the directory for a while and this caching can delay B noticing that there is a new file in the directory. Your only option is to force B's kernel to cache as little as possible. Note that if B is scanning it will presumably only be scanning, say, once a second and so there's always going to be at least a little processing lag (and this processing lag would happen even if A and B were on the same machine); if you really want immediately, you need A to explicitly poke B in some way no matter what.

(I don't think it matters what A's kernel caches about the directory, unless there's communication that runs the other way such as B removing files when it's done with them and A needing to know about this.)

Disclaimer: this is partly theoretical because I've never been trapped in this situation myself. The closest I've come is safely updating files that are read over NFS. See also.


Comments on this page:

By Anonymous at 2014-04-24 06:25:00:

This all makes sense, thanks for sharing your thoughts!

Written on 17 April 2014.
« Where I feel that btrfs went wrong
What modern filesystems need from volume management »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Apr 17 00:10:33 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.