What NFS file-based locking problems can happenNow that we know how NFS is unreliable, we can see what can go wrong when you attempt to do file-based locking over NFS. There are two failures, corresponding to the two ways for things to go wrong when a reply is lost:
I'm not certain if there's any way around the second problem, apart from counting on the server's request/reply cache (and TCP). It doesn't help to check the data in the lock file before you unlock it, because the fatal replay happens in your machine's kernel before you have a chance to check it again. There's no way for the NFS server to detect that you're unlinking a different version of the file than you think you are, because the NFS unlock and rmdir operations only specify a name with no generation count or the like. Actually there is a theoretical tricky way to sidestep the second problem: do the locking in a group-writeable directory with the sticky bit on, and make every machine run the program under a different UID. That way the server won't let you remove a lock file (or directory) that you don't own. And you can use the lock file's ownership to see what machine currently owns the lock. |
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |