2008-02-22
Why I am not fond of Ubuntu's management of kernel updates
It's really simple: we installed the latest Ubuntu 6.06 kernel update last night, since it's a security fix. Today, our machines started panicing with 'kernel BUG at fs/nfs/inode.c:174' messages (three machines so far, one of them three times), so we wanted to revert back to the old kernel.
Guess what: it wasn't there any more. Apparently Ubuntu feels free to have (some) kernel updates overwrite your currently installed kernel, instead of supplementing it with a new version.
(For extra bonus points, this update carried with it a strong warning that the kernel ABI had changed and you would need to recompile any third-party modules. Gosh, I hope you already had any modules you'd need loaded before you overwrote all your old kernel's modules as part of this update, since the new kernel's modules are really unlikely to load in your running kernel.)
I don't really have words to explain how stupid this is. It is trivial to completely version kernels so that you can have even multiple package builds installed next to each other, so trivial that everyone does it (even Ubuntu). And when kernel updates can introduce explosive bugs, it is vital to do this so that people can revert to the previous, working version. Ubuntu does this with sufficiently major updates within a single kernel version; they just don't do it all the time.
Wrong. Broken. Worse, it shows that Ubuntu fundamentally does not get it.
(For a very special bonus, there is no simple way to find out what
kernel package point release version you're currently running; the point
release number is not part of uname -r
or present in any kernel boot
messages. The best you can do is to use the kernel's compilation date
and cross-check it against the release date of packages and the 'Debian'
changelog that Ubuntu supplies.)
Sidebar: our kernel panics
Our panics are with the Ubuntu kernel version '2.6.15-51.66'; we've seen
them on both x86 and x86_64 machines. The reported panic is kernel
BUG at fs/nfs/inode.c:174
, in nfs_clear_inode
, with traces that run
back to sys_umount
and sys_close
; the problem may be related to
forced unmounts, especially forced unmounts that fail. We are doing NFS
v3 mounts from Solaris 8 (SPARC) NFS servers.