2008-02-22
Why I am not fond of Ubuntu's management of kernel updates
It's really simple: we installed the latest Ubuntu 6.06 kernel update last night, since it's a security fix. Today, our machines started panicing with 'kernel BUG at fs/nfs/inode.c:174' messages (three machines so far, one of them three times), so we wanted to revert back to the old kernel.
Guess what: it wasn't there any more. Apparently Ubuntu feels free to have (some) kernel updates overwrite your currently installed kernel, instead of supplementing it with a new version.
(For extra bonus points, this update carried with it a strong warning that the kernel ABI had changed and you would need to recompile any third-party modules. Gosh, I hope you already had any modules you'd need loaded before you overwrote all your old kernel's modules as part of this update, since the new kernel's modules are really unlikely to load in your running kernel.)
I don't really have words to explain how stupid this is. It is trivial to completely version kernels so that you can have even multiple package builds installed next to each other, so trivial that everyone does it (even Ubuntu). And when kernel updates can introduce explosive bugs, it is vital to do this so that people can revert to the previous, working version. Ubuntu does this with sufficiently major updates within a single kernel version; they just don't do it all the time.
Wrong. Broken. Worse, it shows that Ubuntu fundamentally does not get it.
(For a very special bonus, there is no simple way to find out what
kernel package point release version you're currently running; the point
release number is not part of uname -r or present in any kernel boot
messages. The best you can do is to use the kernel's compilation date
and cross-check it against the release date of packages and the 'Debian'
changelog that Ubuntu supplies.)
Sidebar: our kernel panics
Our panics are with the Ubuntu kernel version '2.6.15-51.66'; we've seen
them on both x86 and x86_64 machines. The reported panic is kernel
BUG at fs/nfs/inode.c:174, in nfs_clear_inode, with traces that run
back to sys_umount and sys_close; the problem may be related to
forced unmounts, especially forced unmounts that fail. We are doing NFS
v3 mounts from Solaris 8 (SPARC) NFS servers.
2008-02-10
A basic introduction to prelinking on Linux
At least on the x86 architecture, shared libraries are not entirely made up of position independent code. This means that there is a certain amount of relocation that you have to do when you load a shared library into memory at run time. The basic idea behind prelinking is to try to do this relocation ahead of time; for each shared library, you pick a default location in memory and 'prelink' it so that if it is loaded at that location it doesn't need any run-time relocation. Then the dynamic loader tries to load prelinked libraries at their prelinked locations if at all possible.
(The exact details are explained in the prelink manpage.)
Prelinking has two advantages: because they need to do less relocation at runtime, programs both start faster and use less memory (they dirty fewer pages of shared libraries with per-process relocations). It has the downside that it changes shared libraries and binaries on disk for each system (and changes them again any time you upgrade a shared library), which makes various sorts of security verification harder.
Red Hat enables prelinking by default (in both Fedora and Red Hat Enterprise). Ubuntu and Debian do not seem to do so, although you can turn it on by installing the prelink package and configuring it appropriately.
Prelinking is not a new idea. The first implementation I remember seeing was in SGI's Irix, but in a sense its ancestry goes back to some of the first shared library implementations, which had no dynamic relocation and just statically assigned addresses to shared libraries.
Sidebar: prelink and DT_GNU_HASH
The first time dynamically linked code wants to do something like call
an external function, it has to look through all of the symbol tables in
all of the various bits of code to find the function. DT_GNU_HASH
is the name of a GNU extension to use efficient, fast to search
hash tables for these symbol tables; it and related optimizations can significantly speed up practical
program startup time.
Unlike prelinking, DT_GNU_HASH is done once when a shared library is
built. Because these lookups have to be done whether or not the shared
libraries involved have been prelinked, prelinking and DT_GNU_HASH
are complementary and systems can do both.
Modern versions of Red Hat (both Fedora and Enterprise) use
DT_GNU_HASH; Debian stable does not. Ubuntu 6.06 (their long term
support release) does not, but I believe that current versions do.
2008-02-01
Isolating network interfaces on Linux
Consider a not entirely hypothetical situation: you have an office machine that serves as one end of a GRE tunnel, and, in addition to its official network interface, has a fluctuating number of secondary interfaces on various internal VLANs for testing, debugging, and so on. The simple approach for such a machine is to just turn on global IP forwarding and cross your fingers that no one will decide to make the machine their gateway (apart from the GRE link). But this is not ideal; if nothing else, it may alarm coworkers that you have an unofficial router on the network.
What we really want to do is to isolate the secondary interfaces, making
it so that we won't forward their packets and we won't forward packets
to them for other people. The first part is selective IP forwarding; just turn forwarding on only for eth0 and the
GRE tunnel. The easiest way to do the second part is to use some policy
based routing.
For my office machine, I decided to simplify things by declaring that
the GRE tunnel was allowed to reach everything and thus only traffic
from eth0 needed to be restricted. First we need to add a routing
table for the non-local routes that eth0 is allowed to use, ie the
target of the GRE link:
ip route add R dev GRE table 10
(Here R is the remote IP and GRE is the GRE tunnel device. You may
want to add a 'src LOCAL-IP' as well.)
Next we need some rules to restrict eth0 traffic:
ip rule add iif eth0 priority 5000 table 10
ip rule add type blackhole iif eth0 priority 5001
Translated, this drops any traffic from eth0 that isn't going to the
remote end of the GRE tunnel, exactly as if that interface didn't do
IP forwarding. (Packets to the machine itself are dealt with by an
earlier, default ip rule.)
This is not complete isolation, because we have not given the machine a dual identity for its own traffic. In my situation this is basically harmless, so I haven't gone to the extra effort.