Why I am not fond of Ubuntu's management of kernel updates

February 22, 2008

It's really simple: we installed the latest Ubuntu 6.06 kernel update last night, since it's a security fix. Today, our machines started panicing with 'kernel BUG at fs/nfs/inode.c:174' messages (three machines so far, one of them three times), so we wanted to revert back to the old kernel.

Guess what: it wasn't there any more. Apparently Ubuntu feels free to have (some) kernel updates overwrite your currently installed kernel, instead of supplementing it with a new version.

(For extra bonus points, this update carried with it a strong warning that the kernel ABI had changed and you would need to recompile any third-party modules. Gosh, I hope you already had any modules you'd need loaded before you overwrote all your old kernel's modules as part of this update, since the new kernel's modules are really unlikely to load in your running kernel.)

I don't really have words to explain how stupid this is. It is trivial to completely version kernels so that you can have even multiple package builds installed next to each other, so trivial that everyone does it (even Ubuntu). And when kernel updates can introduce explosive bugs, it is vital to do this so that people can revert to the previous, working version. Ubuntu does this with sufficiently major updates within a single kernel version; they just don't do it all the time.

Wrong. Broken. Worse, it shows that Ubuntu fundamentally does not get it.

(For a very special bonus, there is no simple way to find out what kernel package point release version you're currently running; the point release number is not part of uname -r or present in any kernel boot messages. The best you can do is to use the kernel's compilation date and cross-check it against the release date of packages and the 'Debian' changelog that Ubuntu supplies.)

Sidebar: our kernel panics

Our panics are with the Ubuntu kernel version '2.6.15-51.66'; we've seen them on both x86 and x86_64 machines. The reported panic is kernel BUG at fs/nfs/inode.c:174, in nfs_clear_inode, with traces that run back to sys_umount and sys_close; the problem may be related to forced unmounts, especially forced unmounts that fail. We are doing NFS v3 mounts from Solaris 8 (SPARC) NFS servers.

Written on 22 February 2008.
« Wireless, machine rooms, and the Asus Eee PC
Where the risk is with virtualization (and iSCSI) »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Feb 22 23:57:21 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.