Some kernel lockd NFS error messages explained

September 16, 2009

As before, suppose that your machine is an NFS client. Periodically, it logs kernel messages that look like this:

do_vfs_lock: VFS is out of sync with lock manager!

The kernel has a generic system to handle local file locking for POSIX and flock() locks, implemented in the VFS. Roughly speaking, the NFS client code handles locking by first asking the NFS server for a (remote) lock, then registering the lock locally by calling the kernel VFS locking routines. If the attempt to register the lock locally fails, the kernel prints this message.

(Registering the locks locally is a good thing, if only because it makes them appear in /proc/locks and thus makes lslk see them.)

This is not supposed to happen. Every attempt to lock a file on an NFS-mounted filesystem goes through the NFS code, and the NFS code will only ask for a local lock if the server has given it a remote lock, so there should never be a conflicting local lock that will cause a lock attempt to fail. Yet, it happens anyways. (Our systems log such messages every so often.)

There are two plausible causes for this that I can think of:

  • the server has lost track of a lock it's given to the client.
  • the server and the client disagree about when locks conflict with each other; the server thinks that they do not, so it grants permission, while the client's kernel disagrees.

Unfortunately, the message doesn't print the status code returned by the VFS locking routines, so you can't see any hint as to why they think the lock attempt should fail.

Another error message that we see a fair amount is:

lockd: unexpected unlock status: 7

and perhaps the closely related error:

lockd: failed to reclaim lock for pid 11265 (errno 0, status 7)

I believe that what this means is 'the server says that the NFS filehandle is stale (and is rejecting it entirely)'. I suspect that these errors are nothing to worry about (at one level), because nothing else that the client is trying to do with that file is going to work either.

(At another level you might want to worry; the file has presumably gone stale because a program on another NFS client has done something relatively drastic to it. Quite possibly this other program needs to properly lock the file before doing so.)

Written on 16 September 2009.
« Listing file locks on Solaris 10
A trick to testing https setups on test machines »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Sep 16 01:24:47 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.