Cross-system NFS locking and unlocking is not necessarily fast

April 19, 2014

If you're faced with a problem of coordinating reads and writes on an NFS filesystem between several machines, you may be tempted to use NFS locking to communicate between process A (on machine 1) and process B (on machine 2). The attraction of this is that all they have to do is contend for a write lock on a particular file; you don't have to write network communication code and then configure A and B to find each other.

The good news is that this works, in that cross system NFS locking and unlocking actually works right (at least most of the time). The bad news is that this doesn't necessarily work fast. In practice, it can take a fairly significant amount of time for process B on machine 2 to find out that process A on machine 1 has unlocked the coordination file, time that can be measured in tens of seconds. In short, NFS locking works but it can require patience and this makes it not necessarily the best option in cases like this.

(The corollary of this is that when you're testing this part of NFS locking to see if it actually works you need to wait for quite a while before declaring things a failure. Based on my experiences I'd wait at least a minute before declaring an NFS lock to be 'stuck'. Implications for impatient programs with lock timeouts are left as an exercise for the reader.)

I don't know if acquiring an NFS lock on a file after a delay normally causes your machine's kernel to flush cached information about the file. In an ideal world it would, but NFS implementations are often not ideal worlds and the NFS locking protocol is a sidecar thing that's not necessarily closely integrated with the NFS client. Certainly I wouldn't count on NFS locking to flush cached information on, say, the directory that the locked file is in.

In short: you want to test this stuff if you need it.

PS: Possibly this is obvious but when I started testing NFS locking to make sure it worked in our environment I was a little bit surprised by how slow it could be in cross-client cases.

Written on 19 April 2014.
« What modern filesystems need from volume management
A heresy about memorable passwords »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Apr 19 00:37:21 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.