What NFS file-based locking problems can happen

June 13, 2007

Now that we know how NFS is unreliable, we can see what can go wrong when you attempt to do file-based locking over NFS. There are two failures, corresponding to the two ways for things to go wrong when a reply is lost:

  • if you lose the server's reply to a successful attempt to acquire the lock, your replay of the operation will report a failure even though you actually own the lock. The result is effectively a deadlocked system where the lock will never get released.

    (This is one situation where the ln based style of locking is better than mkdir style, because the lock file can contain some identifying information so you can check it to make sure that you don't own the lock after all.)

  • if you lose the server's reply to your unlocking of the lock and then replay it, you can actually unlock someone else's lock, if they successfully acquired the lock between when the server did your unlock the first time and when you replay your unlock.

I'm not certain if there's any way around the second problem, apart from counting on the server's request/reply cache (and TCP). It doesn't help to check the data in the lock file before you unlock it, because the fatal replay happens in your machine's kernel before you have a chance to check it again. There's no way for the NFS server to detect that you're unlinking a different version of the file than you think you are, because the NFS unlock and rmdir operations only specify a name with no generation count or the like.

Actually there is a theoretical tricky way to sidestep the second problem: do the locking in a group-writeable directory with the sticky bit on, and make every machine run the program under a different UID. That way the server won't let you remove a lock file (or directory) that you don't own. And you can use the lock file's ownership to see what machine currently owns the lock.

Written on 13 June 2007.
« 'Argument list too long' is a misleading message
Getting source RPMs with yumdownloader (part 2) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jun 13 23:19:25 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.