2009-09-16
Some kernel lockd NFS error messages explained
As before, suppose that your machine is an NFS client. Periodically, it logs kernel messages that look like this:
do_vfs_lock: VFS is out of sync with lock manager!
The kernel has a generic system to handle local file locking
for POSIX and flock()
locks, implemented in the VFS. Roughly
speaking, the NFS client code handles locking by first asking the NFS
server for a (remote) lock, then registering the lock locally by calling
the kernel VFS locking routines. If the attempt to register the lock
locally fails, the kernel prints this message.
(Registering the locks locally is a good thing, if only because
it makes them appear in /proc/locks
and thus makes lslk
see
them.)
This is not supposed to happen. Every attempt to lock a file on an NFS-mounted filesystem goes through the NFS code, and the NFS code will only ask for a local lock if the server has given it a remote lock, so there should never be a conflicting local lock that will cause a lock attempt to fail. Yet, it happens anyways. (Our systems log such messages every so often.)
There are two plausible causes for this that I can think of:
- the server has lost track of a lock it's given to the client.
- the server and the client disagree about when locks conflict with each other; the server thinks that they do not, so it grants permission, while the client's kernel disagrees.
Unfortunately, the message doesn't print the status code returned by the VFS locking routines, so you can't see any hint as to why they think the lock attempt should fail.
Another error message that we see a fair amount is:
lockd: unexpected unlock status: 7
and perhaps the closely related error:
lockd: failed to reclaim lock for pid 11265 (errno 0, status 7)
I believe that what this means is 'the server says that the NFS filehandle is stale (and is rejecting it entirely)'. I suspect that these errors are nothing to worry about (at one level), because nothing else that the client is trying to do with that file is going to work either.
(At another level you might want to worry; the file has presumably gone stale because a program on another NFS client has done something relatively drastic to it. Quite possibly this other program needs to properly lock the file before doing so.)