2008-03-20
Why you should ratelimit messages that outside things can cause
Modern versions of NFS have a variety of authentication methods, and so one of the errors that a NFS server can give a client is 'your authentication method is too weak'; for example, the client could be sending plain old Unix UIDs and GIDs to a server that requires Kerberos to get strong distributed filesystem authentication. When this happens, the Linux kernel helpfully prints an error message about it:
call_verify: server somehost.cs requires stronger authentication.
In fact, it prints this message every time it gets an RPC reply with this error. (Some of you are wincing already.)
Our current NFS servers are creaky old Solaris 8 machines. One part of that creakiness is that every so often the kernel loses its mind and decides that some or all clients aren't using a strong enough authentication method to talk to some or all filesystems. When this happens, all NFS IO from the affected clients to the affected filesystems suddenly gets 'authentication too weak' errors.
If we are unlucky, this IO is being done by something active that doesn't notice IO errors. When this happens the machine is basically dead, almost entirely consumed by dumping this message to the console over and over again as fast as the console can print, and it is time for a magic SysRq reboot because nothing else works.
(We've lost more than one major server to this. It's not fun.)
I expect that with a properly behaving NFS server, you'd get this error once at mount time and the mount would fail. But as my example illustrates, you can't count on the outside world to work properly all the time, and that is exactly why you should rate-limit error messages that can be produced by the outside world.
Note that this doesn't just apply to the kernel, and it applies
even if you are dumping messages to syslog. While syslogd will
do rate-limiting of a sort, you and it will burn a bunch of CPU
in the process.
(Yes, I'm going to try to report this to the Linux NFS people; if I can, I'll even try to create a patch. Unfortunately it probably won't help us, because we're running Ubuntu 6.06 and the Ubuntu people will probably not accept or backport such a specialized fix.)
2008-03-18
Some things I dislike about the ASUS Eee
I should note that I actually like my Eee (for what it is). But every time I try to write some sort of review of it, these things bubble up to the top of my mind, so I am going to write them down to get rid of them.
In no particular order:
- my one major issue is that I really want more vertical space.
480 pixels is pretty cramped and limiting; while I can fit in
two overlapping, readable 80x24 terminal windows, I do feel
like I'm peering in through a porthole, and web browsing works
less well.
(And every so often some of the program dialogs just plain run off the bottom of the screen.)
- mine keeps terrible time when powered off, routinely drifting by
multiple seconds in a day.
(And inexplicably, no NTP client support seems to be included in the default software load.)
- despite having no hard drive and a slow processor, the Eee is not
a cool or silent machine. As a first time laptop user I was
somewhat surprised at how hot it gets (apparently it is on the
hot side of modern laptops), and it has an audible fan if you let
it sit powered on.
- if I let it sit powered off and unplugged from power, it slowly drains
the battery. Well, I guess I now know why it draws two watts from the
wall connector even when powered off (cf EeePowerConsumption), although
I still have no idea what it's doing with the power.
- the Eee doesn't even pretend to have security. A basic level of Unix security would be nice, and an option for encrypted disk space would be better, especially since Linux has both.
(And oh yeah, it would have been nice not to be rootable out of the box.)
2008-03-08
What controls Red Hat Enterprise's ethN device names
Since I just went digging for this the other day, here's what I know about what controls what Ethernet devices get named on Red Hat Enteprise (and probably also on Fedora, but I haven't looked at my Fedora systems in this level of detail).
- if
kudzuis enabled, it uses/etc/sysconfig/hwconfto name everything. If there is no such file or the data in it doesn't match current reality, various bad things happen.(You can probably hand-edit the file if necessary.)
- otherwise, interface naming is controlled by the
HWADDRsetting in theifcfg-*files in/etc/sysconfig/network-scripts. If there is no Ethernet address specified, you get no renaming.
The ifcfg files are used by both udev and the ifup scripts
that actually bring interfaces up and so on. When udev detects a
new network device (including at boot, I believe), it runs
/lib/udev/rename_device, which searches the ifcfg-* files for a
HWADDR that matches the new device and uses the DEVICE setting from
that file to give a name to the new interface.
(A network device that is hotplugged after system boot also winds
up running /etc/sysconfig/network-scripts/net.hotplug.)
During boot, the order of operation is udev first, then kudzu,
and finally the network init script winds up ifup'ing all of the
interfaces that are supposed to be running, potentially undoing any
damage kudzu did (if kudzu left the ifcfg configuration files
along, which is unlikely).
(You may gather that I have a pretty low opinion of kudzu; in fact, I
have been turning it off on most of my systems for years. It was left
enabled on this RHEL system mostly because I hadn't taken the time to
audit what init scripts were getting run.)
My problem with Ethernet naming on Red Hat Enterprise 5
Here's my problem: I have a bunch of identical 1U servers (SunFire
X2100 M2s) with four onboard Ethernet ports, driven by two different
chipsets (two nVidia ones, two Broadcom ones). I want to configure our
RHEL installs so that no matter which physical unit I stuff the system
disks into, the Ethernet ports come up with consistent names that match
the ports on the back of the server; eth0 should always be the port
labeled 'port 0' and so on.
(Since they have hotswap drive bays, we want to be able to easily swap drives between units in case of hardware failure or the like. It also simplifies general administration a bunch if the Ethernet naming matches the hardware naming.)
In the good old days, this was simple; just set up /etc/modprobe.conf
to alias eth0 and eth1 to the tg3 driver and eth1 and
eth2 to the forcedeth driver, and everything usually worked.
In the new world of udev, not so much; much like with Ubuntu, everything really wants to name things based
on known Ethernet addresses, and there seems to be no way to control
what order modules are loaded in. The furthest I've gotten is a
configuration that does nothing with any 'new' Ethernet ports, so you
have to log in on the console and change all of the HWADDR values in
the ifcfg files to have the correct Ethernet addresses.
(To do this, you have to turn off kudzu with 'chkconfig --del
kudzu'. If you leave it enabled, it will helpfully configure any 'new'
Ethernet ports to do DHCP on boot, and in the process it will replace
your working ifcfg files with new ones. Yes, it leaves the old files
around with .bak extensions, but I am pretty sure that if you swap
hardware twice you will lose them entirely.)
2008-03-06
Software RAID, udev, and failed disks
Suppose that you have a software RAID array. Suppose further that you have a disk or two fail spectacularly; they don't just have errors, they go offline completely.
Naturally, software RAID fails the disks out; you wind up with something
in /proc/mdstat that looks like this:
md10 : active raid6 sdbd1[12] sdbc1[11] sdbb1[10] sdba1[9] sdaz1[13](F) sday1[7] sdax1[6] sdaw1[5] sdav1[14](F) sdau1[3] sdat1[2] sdas1[1] sdar1[0]
(Yes, this system does have a lot of disks. Part of it is that multipathed FibreChannel makes disks multiply like rabbits.)
So we want to remove the failed disks from the array (perhaps because we have pulled out their hot-swap drive sleds in order to swap new disks in):
# mdadm /dev/md10 -r /dev/sdav1
mdadm: cannot find /dev/sdav1: No such file or directory
This would be because udev removed the /dev nodes for the disks
when they went offline, which is perfectly sensible behavior except
it presents us with a bit of a chicken and egg problem.
(If this was a Fedora system with mdadm 2.6.2 I might be able to use the
'-r failed' option, but this is a Red Hat Enterprise 5 system with
mdadm 2.5.4, and I am out of luck. And if I wanted to remove just one
of the two failed drives, I would still be out of luck even on Fedora.)
Reinserting the drives doesn't help, at least in this case, as the system sees them as entirely new drives and assigns them a different sd-something name. (It does this even if they are literally the same disk, because you artificially induced this failure by pulling the drive sleds in the first place.)