2012-11-30
All kernel messages should be usefully ratelimited. No exceptions.
I've written about this before, but here's today's version. On some of our new Dell servers, the Ubuntu 12.04 kernel will produce messages like these two:
[Firmware Warn]: GHES: Failed to read error status block address for hardware error source: 49378. ghes_read_estatus: 2 callbacks suppressed
(If you're coming here from a web search on this message text and want to know how to fix it, see the sidebar at the bottom of this entry.)
See that number at the end of the first message? It changes. That makes the total message different, which makes all of syslog's duplicate suppression stuff fail. The net result is that this flooded our logs to the tune of two or three messages every two seconds. Yesterday this was good for 151,000 messages. Good luck seeing any important kernel messages in that flood.
(Today we noticed and disabled this entire subsystem, which took a system reboot. Fortunately this machine is not in production.)
There are three failures here. The first is the customary epic failure to consider scale. The people responsible for the code that dumped out this message presumably thought that it would trigger only very occasionally, but instead on some systems it sticks on and boom, so much for your logs. When you write kernel messages (or any code that prints messages), you need to consider what happens if the cause is more common than you expect. Do people actually need to see that many messages? The answer is generally no.
(What makes this worse is that the code responsible for this actually tries to ratelimit it a bit but doesn't do it anywhere near well enough.)
The second failure is that this message is, in the jargon, not actionable. There is nothing most people running a Linux machine can really do about this message except ignore it. The kernel code has some problem interacting with my hardware? I can't fix either part of this and if the machine is not malfunctioning as a result of this I don't care about the situation. This is essentially a useless kernel message.
The third failure is a failure of the Linux kernel infrastructure.
These cases keep happening over and over again because the default
message reporting interface does not try to make people think about
these issues. The more messages the Linux kernel dumps in my lap,
the more I think that its printk() interfaces should be (usefully)
ratelimited by default and you should have to go out of your way to
print something at all frequently. When good ratelimiting is hard,
people don't do it. When it's the default, people will.
(Another thing that would help is a separate message reporting infrastructure for bulk warnings like this, one where the messages do not appear in the kernel message stream but instead show up in, say, per-module message logs in sysfs or something. Then code that felt compelled to report every instance of this sort of thing could do so while not contaminating the useful kernel messages.)
Sidebar: what this means and how to fix it
The simple way to fix this: add 'ghes.disable=1' to the kernel
command line in whatever way is appropriate
for your distribution (these days, usually changing /etc/default/grub
and running update-grub) and reboot your machine. This will turn
off the entire subsystem responsible for this message, which is
unfortunately the only good way to do it.
This is apparently a standing bug in some kernels (allegedly only 32-bit ones, which matches our experiences) on at least (some) recent Dell servers; see Ubuntu bug #881164 and Fedora #746755. However, it's erratic; it doesn't happen on all of our recent Dells, even Dells of the same model that are all running 32-bit kernels. Since this has happened on two of our servers already, I suspect that we're going to wind up just automatically disable GHES in our standard Ubuntu 12.04 install. The very vague potential gains of GHES reports are not worth the clear downsides.
The message itself comes from drivers/acpi/apei/ghes.c. To quote the comments in that file:
APEI Generic Hardware Error Source support
Generic Hardware Error Source provides a way to report platform hardware errors (such as that from chipset).
'APEI' is the ACPI Platform Error Interface (cf). If you are cringing at the mention of ACPI here, well, yeah, that was my reaction too.
2012-11-18
Linux is less divergent than commercial Unixes used to be
Today, Trever Miller tweeted:
Tell me again how the Linux fragmentation is any different than the UNIX fragmentation of yore?
To boil my reply and the resulting conversation down, my view is that Linux is far less fragmented and divergent than commercial Unixes were back in the days of commercial Unixes. Part but not all of this is due to everyone having settled on a single hardware platform (yes, I know, ARM may upset this applecart, but not yet).
The reality is that a lot of Linux is common across most distributions (you can always do bizarre things with Linux but most distributions don't). Some of the credit for this goes to the FHS and some goes to open source competition, which insures that when people come up with good ideas they spread rapidly. But beyond those reasons, most Linux distributions are simply built on top of an essentially common base that differs only in minor details.
The commonality is impressive. Almost every Linux distribution has
system calls, low level C and C++ ABIs, and most higher level libraries
in common with everyone else, to the point where you can often copy
relatively sophisticated binaries from distribution to distribution.
In addition pretty much everyone has tacitly agreed on a common
set of tools and ways that the system works, things like PAM and
/etc/cron.d. To a large extent you can use and even administer a Linux
machine without noticing very much about what distribution it's running.
I was around for the era of commercial Unixes and I can assure you that this was not the case back then. Solaris, HP/UX, Irix, Ultrix, and AIX were far more divergent from each other than any mainline Linux distribution I've ever run into. Replacing an Ultrix machine with an Irix machine (or an Irix machine with a Linux machine) was a significant project; many things had to change and how you dealt with them was quite different. Even today Solaris on x86 looks nothing like Linux on x86 (I'm talking at the level of system administration or programming).
(Also, the commercial Unix world was full of interesting technologies that never spread because of vendor licensing issues. Sometimes this basically dictated your choice of which Unix you were going to use.)
The user experience is much more unified, too. Today you can run KDE or probably Gnome 3 on probably any main Linux distribution and you can certainly compile and run programs written for either on basically anything. This was very much not the case back in the days of commercial Unixes, not unless you wanted to write to Motif (and look very unconventional on non-Motif machines) and then probably pay a licensing fee to distribute your compiled program.
(Okay, you could write to basic X, do a lot of work yourself, and still look bad. This was a surprisingly popular choice, not because it was a good one but because people didn't have any other option.)
I feel strongly about this issue because I've dealt with both commercial Unixes and multiple Linux distributions. I would much rather do the latter than the former; it is much easier and more transparent.
(Today we use Ubuntu, Red Hat Enterprise, Solaris, and OpenBSD machines. The Solaris and OpenBSD machines are far more different from the Linux machines than the Ubuntu and RHEL machines are from each other.)
2012-11-13
The problem with SELinux (still)
Here's a list of almost all of the bugs that got fixed in recent Fedora 17 SELinux policy update:
867107 - SELinux is preventing /usr/sbin/in.tftpd from using the 'dac_override' capabilities.
868656 - SELinux is preventing /usr/bin/python2.7 from using the 'sys_nice' capabilities.
868866 - SELinux is preventing /usr/sbin/fping from 'create' accesses on the rawip_socket .
869468 - SELinux is preventing tuned from using the 'setsched' accesses on a process.
871097 - SELinux is preventing haveged from 'read' accesses on the file meminfo.
872768 - SELinux is preventing /usr/sbin/rpc.statd from 'write' accesses on the sock_file rpcbind.sock.
872894 - SELinux is preventing /usr/libexec/nm-openvpn-service from 'open' accesses on the file /home/karl/.cert/klatiss.key.
873030 - SELinux is preventing /usr/bin/dbus-daemon from 'write' accesses on the blk_file /dev/sdb.
873393 - SELinux is preventing /usr/bin/qemu-system-x86_64 from using the 'execmem' accesses on a process.
846001 - type=AVC msg=audit(1344196154.847:445): avc: denied { create } for pid=6975 comm="rhnsd" scontext=system_u:system_r:rhsmcertd_t:s0 tcontext=system_u:system_r:rhsmcertd_t:s0 tclass=unix_dgram_socket
868456 - SELinux is preventing freshclam from search access on the directory amavisd.
870659 - SELinux is preventing agetty from access on ttyUSB0.
821189 - SELinux is preventing polkit-agent-he from using the 'setsched' accesses on a process.
860226 - SELinux is preventing /usr/sbin/nslcd from 'name_connect' accesses on the tcp_socket .
865328 - SELinux is preventing /usr/lib64/nspluginwrapper/npconfig from 'getattr' accesses on the filesystem /.
I want to emphasize that this is a completely typical bug list for a completely typical SELinux policy update. I have not cherry-picked an unusually long one; they are pretty much all like this.
Fedora has been using SELinux for years, probably at least half a decade by now. It is still fixing many, many instances where SELinux gets in the way of programs doing legitimate things that people want them to do. The progress that it has made in literally years is at best that these are somewhat more obscure programs doing somewhat more unusual things than they used to be. Using SELinux in non-mainstream environments still means a steady rain of these denials, all of them getting in your way.
(Distributions have spent years desperately trying to make it easier to deal with this rain, but it is still a pain in the rear. I run into it periodically on my laptop, which I still masochistically have SELinux turned on on.)
Note that this is not using SELinux in non-default configurations. These bugs are all (I assume) from stock configuration systems that SELinux just didn't quite cope with correctly.
At a meta-level all of these bugs happen because doing a complete inventory of legitimate program behavior is very hard, especially because it's not something that can be done automatically (at least currently). A fallible human has to stare a lot at a program to write its SELinux policy, and if they miss something you get another one of these bugs. People miss obscure corner cases a lot, especially when they are not intimately familiar with all of the aspects of the software.
Also, these are only the bug fixes for the bugs that people actually reported to Fedora. Since reporting Fedora bugs involves more than a little bit of work and pain I assume that any number of people don't bother to report SELinux problems that they run across; they either just tell the system to make things work or they turn SELinux off entirely.
(See also.)
2012-11-11
Explaining an RPM oddity
Recently, Jordan Sissel tweeted:
Hey guys I beat the final boss of RPM: gist.github.com/4053631
Because I believe that gists may expire, I'm going to quote it here:
Two packages. Same name. Same file. Different epoch.Both installed simultaneously.rpm.pork(~/projects/fpm) % rpm -ql fizz /tmp/test /tmp/test pork(~/projects/fpm) % rpm -qa | grep fizz fizz-1.0-1.x86_64 fizz-1.0-1.x86_64 pork(~/projects/fpm) % rpm -qa --qf '%{NAME} %{EPOCH}-%{VERSION}-%{RELEASE}\n' | grep fizz fizz 0-1.0-1 fizz 1-1.0-1
There are several things going on at once here, but once you understand them all I think you'll see how this very peculiar situation comes about 'naturally' through general RPM processes.
First, RPM has always allowed you to install more than one version of an
RPM at once provided that they didn't conflict; this was how RPM-based
systems handled having several versions of the kernel installed at
once. Back in the days of using rpm directly, you had to remember to
install normal upgraded packages with 'rpm -U' instead of 'rpm -i';
doing the former would replace the old version while doing the latter
would try to install the new version along side the old one, which
generally didn't work too well.
In the beginning (I think), two RPMs conflicted if they both tried to supply the same file. At some point, this changed so that two RPMs only conflicted if they tried to supply the same file with different contents; if they supplied the same file with the same contents, you could install both at once. This was a core part in enabling multi-architecture support in RPM, which is what lets you install 32-bit and 64-bit x86 RPMs alongside each other. Of course these overlapping files normally happen because the 32-bit and 64-bit packages both have some common architecture-independent files like manpages or the like, but it's valid for any two RPMs (even two RPMs for the same architecture) to have common files.
(It's possible that RPM allowed some overlapping files very early in
order to deal with badly done RPM packages that claimed to own common
directories like /usr/bin or the like.)
Finally we get to RPM epoch numbers. The quick version is that epoch numbers are a hack to deal with upstream packages that either change their version numbering scheme or that don't have sensible ones in general. Because it's generally uninteresting (almost all packages have what is effectively an epoch of '0') and because it's not something that users should care about, RPM doesn't show it by default. However, it is formally part of the version number of an RPM and so two RPMs that differ only in the epoch have different version numbers and are different RPMs.
(This should normally not happen. If it does it means that the upstream reused version numbers, ie they did two separate and different '1.0' (or whatever) releases of the same package.)
So now we see what's going on here. We have two different version numbers for the same RPM package and the packages don't have any conflicting files (they both provide the same version of one file). So RPM will let you install both at once. Meanwhile its default output hides the epoch, making everything look mysterious.