Wandering Thoughts archives

2009-09-20

Why kernel packaging is so bad in Debian and Ubuntu

I've written before about Ubuntu's substandard kernel update management; what I didn't say then (partly because I didn't fully understand it) is that this is just a manifestation of a general problem with kernel packaging in Ubuntu (and Debian, which it inherits from).

The fundamental problem for packaging kernels on Debian and Ubuntu is that there are two unfortunate design choices in dpkg, the Debian package system: you can only every have one version of a package installed at once, and there is no support for 'multi-arch' packages.

That you can only install a single version of a package at once means that if you want to have multiple kernels installed at once, they must have different package names, not just be different versions of the same package; hence the Debian and Ubuntu necessity of having the kernel version embedded in the package name. This then creates the temptation to do 'minor' kernel updates without changing the package name, and in turn this creates the situation where such a 'minor' change can introduce a bug but leave you with no easy way to reboot to the old kernel.

That packages cannot be multi-arch precludes having kernel packages with the same name but that are for different sub-architectures, so Debian and Ubuntu have to put the sub-architecture into the package name (creating 'linux-image-<blah>-i386' package names and so on). This means that it is impossible to have a generic package name that winds up installing the right kernel-related bit under all circumstances; you cannot have, say, a linux-headers package that installs the correct set of headers for whatever kernel variant you are actually running. Instead you need a different generic package for each kernel sub-architecture variant, and then users have to keep it all straight.

(With multi-arch, a generic linux-headers package would depend on linux-headers-<kver>, there would be various sub-architecture versions of the latter package, and the package system would sort out which sub-architecture was the right one. Without multi-arch (or conditional dependencies), this can't work.)

DebianKernelPackagingProblem written at 00:47:19; Add Comment

2009-09-16

Some kernel lockd NFS error messages explained

As before, suppose that your machine is an NFS client. Periodically, it logs kernel messages that look like this:

do_vfs_lock: VFS is out of sync with lock manager!

The kernel has a generic system to handle local file locking for POSIX and flock() locks, implemented in the VFS. Roughly speaking, the NFS client code handles locking by first asking the NFS server for a (remote) lock, then registering the lock locally by calling the kernel VFS locking routines. If the attempt to register the lock locally fails, the kernel prints this message.

(Registering the locks locally is a good thing, if only because it makes them appear in /proc/locks and thus makes lslk see them.)

This is not supposed to happen. Every attempt to lock a file on an NFS-mounted filesystem goes through the NFS code, and the NFS code will only ask for a local lock if the server has given it a remote lock, so there should never be a conflicting local lock that will cause a lock attempt to fail. Yet, it happens anyways. (Our systems log such messages every so often.)

There are two plausible causes for this that I can think of:

  • the server has lost track of a lock it's given to the client.
  • the server and the client disagree about when locks conflict with each other; the server thinks that they do not, so it grants permission, while the client's kernel disagrees.

Unfortunately, the message doesn't print the status code returned by the VFS locking routines, so you can't see any hint as to why they think the lock attempt should fail.

Another error message that we see a fair amount is:

lockd: unexpected unlock status: 7

and perhaps the closely related error:

lockd: failed to reclaim lock for pid 11265 (errno 0, status 7)

I believe that what this means is 'the server says that the NFS filehandle is stale (and is rejecting it entirely)'. I suspect that these errors are nothing to worry about (at one level), because nothing else that the client is trying to do with that file is going to work either.

(At another level you might want to worry; the file has presumably gone stale because a program on another NFS client has done something relatively drastic to it. Quite possibly this other program needs to properly lock the file before doing so.)

LockdKernelErrorExplained written at 01:24:47; Add Comment

2009-09-03

An interesting issue with doing NFS over TCP (apparently)

We have a lot of NFS filesystems and, like most people today, we use NFS over TCP instead of over UDP. But this leads to a problem; sometimes when our systems reboot, they can't mount all of the NFS filesystems on the first attempt. It generally takes several minutes to get to a state where all of them are mounted.

(We don't use the automounter, so we mount everything at boot; we have our own solution for the problems the automounter is trying to solve.)

The cause turns out to be interesting; we're running out of reserved ports, apparently ultimately because of all of the NFS mount requests we make in close succession. Like the NFS server, the NFS mount daemon usually requires you to talk to it from a reserved port, and although each conversation between mount and mountd is short-lived and we only make one mount request at a time, Linux can wind up not letting you reuse the same source port to talk to the same mount daemon for a timeout interval. It turns out that we have enough NFS mounts from few enough fileservers that we can temporarily run out of reserved ports that mount can use to talk to a particular fileserver's mountd.

(This is the TIME_WAIT timeout for a given combination of source IP address, source port, destination IP address, and destination port. The destination port on a given fileserver is always fixed, so effectively the only variable is the source port, and there's a limited supply of reserved ports that mount is willing to use.)

Our experience is that this doesn't happen when we use NFS over UDP (we have one system that does this, for reasons that may not be applicable any more). Having written this entry, I'm now not sure why this is so, since although the actual NFS traffic is UDP-based, mount is presumably still talking to mountd with TCP and so is still using up reserved ports there.

(This is somewhat related to an earlier problem with NFS mounts that we've had.)

TCPNFSMountProblem written at 22:49:01; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.