2014-09-18
Ubuntu's packaging failure with mcelog
in 14.04
For vague historical reasons we've had the mcelog
package in our
standard package set. When we went to build our new 14.04 install
setup, this blew up on us; on installation, some of our machines
would report more or less the following:
Setting up mcelog (100-1fakesync1) ... Starting Machine Check Exceptions decoder: CPU is unsupported invoke-rc.d: initscript mcelog, action "start" failed. dpkg: error processing package mcelog (--configure): subprocess installed post-installation script returned error exit status 1 Errors were encountered while processing: mcelog E: Sub-process /usr/bin/dpkg returned an error code (1)
Here we see a case where a collection of noble intentions have had terrible results.
The first noble intention is a desire to warn people that mcelog
doesn't work on all systems. Rather than silently run uselessly or
silently exit successfully, mcelog
instead reports an error and
exits with a failure status.
The second noble intention is the standard Debian noble intention
(inherited by Ubuntu) of automatically starting most daemons on
installation. You can argue that this is a bad idea for things like
database servers, but for basic system monitoring tools like mcelog
and SMART monitoring I think most people actually want this; certainly
I'd be a bit put out if installing smartd
didn't actually enable
it for me.
(A small noble intention is that the init script passes mcelog
's
failure status up, exiting with a failure itself.)
The third noble intention is that it is standard Debian behavior
for an init script that fails when it is started in the package's
postinstall script to cause the postinstall script itself to exit
out with errors (it's in a standard dh_installinit
stanza).
When the package postinstall script errors out, dpkg
itself flags
this as a problem (as well it should) and boom, your entire package
install step is reporting an error and your auto-install scripts fall
down. Or at least ours do.
The really bad thing about this is that server images can change
hardware. You can transplant disks from one machine to another for
various reasons; you can upgrade the hardware of a machine but
preserve the system disks; you can move virtual images around; you
can (as we do) have standard machine building procedures that want
to install a constant set of packages without having to worry about
the exact hardware you're installing on. This mcelog
package
behavior damages this hardware portability in that you can't safely
install mcelog
in anything that may change hardware. Even if the
initial install succeeds or is forced, any future update to mcelog
will likely cause you problems on some of your machines (since a
package update will likely fail just like a package install).
(This is a packaging failure, not an mcelog
failure; given that
mcelog
can not work on some machines it's installed on, the init
script failure should not cause a fatal postinstall script failure.
Of course the people who packaged mcelog
may well not have known
that it had this failure mode on some machines.)
I'm sort of gratified to report that Debian has a bug for this, although the progress of the bug does not fill me with great optimism and of course it's probably important enough to ever make it into Ubuntu 14.04 (although there's also an Ubuntu bug).
PS: since mcelog
has never done anything particularly useful for
us, we have not been particularly upset over dropping it from our
list of standard packages. Running into the issue was a bit irritating
though, but mcelog
seems to be historically good at irritation.
PPS: the actual problem mcelog
has is even more stupid than 'I
don't support this CPU'; in our case it turns out to be 'I need a
special kernel module loaded for this machine but I won't do it for
you'. It also syslogs (but does not usefully print) a message that
says:
mcelog: AMD Processor family 16: Please load edac_mce_amd module.#012: Success
See eg this Fedora bug and this Debian bug. Note that the message really means 'family 16 and above', not 'family 16 only'.