A peculiar change in Linux flock() and fcntl() behavior

July 22, 2009

Here is one of those fun issues that cause me to pull out my hair (although it can give me a peculiar sense of satisfaction to track it down).

Suppose that you have two filenames, such as (not entirely hypothetically) .vacation.dir and .vacation.pag. As it happens, these filenames are actually hardlinks, so there is only one actual file involved. Now, suppose you have code that is like this C-oid pseudo-code:

fl = {F_WRLCK, SEEK_SET, 0, 0, getpid()};
fd1 = open(".vacation.dir", O_RDONLY);
flock(fd1, LOCK_EX);
fd2 = open(".vacation.pag", O_RDWR);
fcntl(fd2, F_SETLK, &fl);

If and only if you are on a sufficiently modern Linux and the files are on an NFS filesystem (possibly it depends on the NFS server), the fcntl() will fail with EAGAIN. If you don't know that the files are hardlinks, it may take you some time to realize what's going on, especially because many of the test programs you write will probably work fine (except when applied to that specific pair of files).

(But wait, it gets weirder. Replace the fcntl() with flock() and it will fail even on local filesystems and on older kernels. This behavior disagrees with the manpage, which is explicit in that separate file descriptors to the same file are treated independently. Updated: I was badly misreading the manpage and this is correct flock() behavior; treating the file descriptors independently means that separate locks on them will conflict, not that they won't. See the comments.)

In this case, Ubuntu 6.06 is not sufficiently modern but Ubuntu 8.04 is, and guess what we just did today. (If you guessed 'upgraded our mail server', you win.)

Now, you might sensibly ask why we have code that is trying to do such a crazy thing in the first place. The answer to that is that the fcntl() is actually done in the gdbm library's dbm_open function, which errors out if it fails. We don't want to error out, we just want to serialize things, and so we need to add our own locking to do so, which needs a file, and what better file to use than the other one of the DBM database files, since we know that it has to exist.

(I am not sure what file to use as a replacement for the serialization, although clearly we need to find one.)


Comments on this page:

From 195.20.195.134 at 2009-07-22 09:19:01:

http://nfs.sourceforge.net/#faq_d10 says that flock() from Linux clients works on NFS since 2.6.12 by emulating it with a POSIX lock for the whole file; subsequent fcntl(fd, F_SETLK, ...) therefore conflicts with that lock and results in EAGAIN.

As for double flock() on descriptors obtained with separate open() calls - this is also the proper behavior: for flock() such opens are completely separate, unlike fcntl() locks; only file descriptors referring to the same file structure (obtained from dup()-like syscall or inherited by a forked child process) are considered the same. Therefore flock() locks for separate opens of the same file (even through different names) conflict with each other - even if performed by the same process.

A trick used sometimes when you don't have a regular file for locking is to open() a directory and use its file descriptor - however, I'm not sure how portable is it.

By cks at 2009-07-22 11:20:22:

I suddenly feel stupid, because I was reading the manpage exactly backwards (and in a way that made no sense if I actually thought about it). You're entirely right about this being the proper behavior for flock(), and now the fcntl() behavior also makes more sense.

Thank you!

Written on 22 July 2009.
« Packages should not contain both tools and policies
Thinking like a security paranoid: an example »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jul 22 01:05:00 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.