Another reason to safely update files that are looked at over NFS

July 16, 2009

Suppose that you are writing a script on one system but testing it on another (perhaps the first system is the one that has your full editing environment setup). You go along in your cycle of edit, save, run, edit, save,

./testscript: Stale NFS file handle

What just happened?

You've run into the issue of safely updating files that are read over NFS, even though you weren't reading the file at the time you saved it.

In theory, every time an NFS client needs to turn a name into an NFS filehandle it should go off and ask the server. In practice, for efficiency NFS clients generally cache this name to filehandle mapping information for some amount of time (how long varies a lot). Usually no one notices, but you got unlucky; when you tried to run the script, the second machine had cached the filehandle for the old version of the file, which no longer exists, and when it tried to read the file the NFS server told it 'go away, that's a stale NFS filehandle'.

Running scripts isn't the only thing that can get stale filehandle errors because of cached mappings, it's just one of the more obvious ones because you actually get error messages. I believe that test is another case (although I haven't yet demonstrated this in a controlled test):

if [ test -f /some/nfs/file ]; then
  ...
fi

I believe that this will silently fail if the client's cache is out of date, as the client kernel winds up doing a GETATTR on a now-invalid NFS filehandle (because test will stat() the file to see if it's a regular file or not).

Written on 16 July 2009.
« A Bourne shell gotcha with ( ... ) command grouping
The hard problem of live major release upgrades »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jul 16 00:52:21 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.