2010-05-19
Why RPM's .rpmnew files don't work in practice
In theory, RPM has a solution to yesterday's bad package upgrade
problem by way of what are generally called
.rpmnew files. When you've modified a configuration file, update the
RPM it comes from, and the new version of the RPM has also changed the
configuration file, RPM does not overwrite the configuration file but
instead creates <config-file>.rpmnew with the new version of it.
This does not work in practice. The problem is that there are entirely too many false positives; it is entirely routine to install packages, change nothing about them, upgrade the packages, and have .rpmnew files sprayed across your system. In this situation, requiring sysadmins to carefully diff each .rpmnew file with its original version and decipher what needs to be done is a great way to cause sysadmins to never install updates, especially because doing this reconciliation often requires us to actually understand the configuration files.
(Note that there's no universal answer for what to do with the .rpmnew files; sometimes you use them and throw away the old version, and sometimes you throw them away because your system has auto-edited important things into the old version. If you're sufficiently unlucky, the new RPM has important changes and the file has been auto-edited and you get to merge them yourself while cursing everyone concerned.)
In practice, I expect that almost everyone ignores .rpmnew files; in fact, if you use the graphical interfaces to package updating I don't think you even get told about them. The immediate consequence of this is that no package update should depend on having its .rpmnew files reconciled by hand, because it's not going to happen.
(If you build such a package update anyways, congratulations, you are an asshole. And I mean that in the technical sense.)
Just to add the icing on this particular cake, I believe that you can get .rpmnew files when upgrading multi-arch packages, for reasons that are similar to how you can get multi-arch file conflicts. This situation is at least easily recognizable, because the .rpmnew file is identical to the normal config file.
(Perhaps modern versions of RPM recognize this situation by now; I am somewhat behind on Fedora releases.)
Sidebar: sysadmins and understanding configuration files
In case it is not obvious: it is extremely common for sysadmins to install packages that have 'configuration files' that we have not read up on and have no understanding of. This is not a bad thing, it is a thing to be aspired to, because it means that your package just works without needing to be customized.
For a direct example, how many of you understand the OpenSSL
configuration file, openssl.cnf? I suspect very few people do, yet
pretty much everyone has OpenSSL installed and so has an OpenSSL
configuration file. If understanding openssl.cnf was a requirement
to installing and using OpenSSL, well, I suspect you can guess how
well that would work out.
(And yes, on some of my machines I have openssl.cnf.rpmnew files.)
2010-05-18
An illustrated example of how not to do package updates
I spent a greater part of today discovering how and why smartd was
not monitoring our disks on several of our Red Hat Enterprise 5 based
iSCSI backends. The quick summary is that if your
RHEL 5 machine was installed some time ago (or installed recently from
an old installer disk) and upgraded since then, smartd may not be
monitoring all of your disks; this is all but certain if you've recently
added new disks.
What happened to cause this is a badly considered package update. In
the beginning of RHEL 5, Red Hat's version of the smartmontools RPM
shipped with an /etc/smartd.conf and a system that defaulted to
auto-generating the list of disks to monitor every time you started
smartd. Later, RHEL updated smartmontools and stopped doing this;
instead the new version of /etc/smartd.conf told smartd to do the
scan for disks itself. However, applying the update does not re-do
existing /etc/smartd.conf files, which now have a static list of disks
to monitor that never gets updated. If your list of disks changes after
the package update is applied, you lose.
We normally update our iSCSI backends immediately when they're installed, before we connect them to the enclosure with their data disks. The net result is that several backends were left silently monitoring only the system disks, which is what you could call not desirable.
(The problem does not occur with recent RHEL 5 install images, which
have the updated smartmontools RPM rolled into the base OS.)
To be blunt, this is a badly done package update, especially for a theoretically 'enterprise' operating system. You should never create a situation where a sysadmin installs a package without changing its configuration files, installs an upgrade, and the package stops working, partly because creating such situations is a great way to persuade sysadmins to never install package updates.
(What RHEL should have done is keep the auto-generation system but
change the default /etc/smartd.conf to not use it. Then everyone would
have been happy; people with the old configuration would have had it
keep updating their list of disks, and people with the new configuration
would have their disks auto-detected by smartd itself.)
2010-05-06
The right way and the wrong way to disable init.d services
First, a quote from the (Ubuntu) manpage for update-rc.d (pointed
out in the comments on a recent entry):
The correct way to disable services is to configure the service as stopped in all runlevels in which it is started by default. In the System V init system this means renaming the service's symbolic links from
StoK.
Here is one difference between a developer and a sysadmin: to a developer, something is disabled if it only runs harmless code or only runs code in harmless situations. To a sysadmin, something is only disabled if it doesn't run any code at all.
The wrong way to disable init.d scripts is to leave K* symlinks around
for them; it is the developer answer, not the sysadmin answer. The
init.d scripts are still running, they are just theoretically only doing
harmless things or only running in harmless situations (when the system
will soon reboot anyways). In practice, no; there are too many init.d
scripts that feel free to have their stop action do things that range
from undesirable to dangerous, and any number that blithely assume that
any instance of the daemon that they start must have been started by
them.
(This is especially the case if stop or restart actions are going to
run during package upgrades, instead of just when the system is shutting
down. And if you actually use multiple runlevels, your life hurts.)
The right way to disable init.d scripts is to remove all rcN.d symlinks, both start and stop, and keep them removed (one way or another). That way I do not have to trust that the authors of the daemons that we aren't actually running all got it right, because I'm pretty certain that they didn't.
Systems that insist on doing things the developer way instead of the sysadmin way are broken, whether their developers realize this or not.
(I'm aware that I'm not going to persuade anyone important of this, and I'm sure that the Debian people will be happy to tell me that I'm totally wrong and they always get init.d scripts perfect and thus their way is the correct one. Sorry, I'm a sysadmin, I don't believe in systems where everyone has to get everything right all the time.)
(The comments on the recent entry both corrected my original mistaken ideas and caused me to think about all of this.)
2010-05-04
Dear software packagers, startup scripts edition
This is a grump.
Dear packagers of software for Linux distributions, please note this: just because I have your software installed does not mean that I want to run your daemon. Database systems, I am especially looking at you. Please package software accordingly, and please make sure that daemons stay turned off and not running even when harried sysadmins apply package upgrades.
Since the natural way for a harried sysadmin to make a daemon not run
is to turn off its init.d script (with chkconfig on Red Hat derived
systems and update-rc.d on Debian derived ones), your packaging should
ideally support this. This means that you should not automatically
recreate the /etc/rcN.d symlinks on updates; instead you should leave
things alone, with whatever symlinks in place (or not in place) that are
already there.
(Red Hat derived systems seem to be good about this. Debian derived
ones, not so much. But perhaps I misunderstand the zen of Debian systems
and the natural way to turn daemons off is not with update-rc.d,
although I will point out that there are reasons to juggle the boot time
priorities too.)
If you have another way to disable your daemon, especially if manipulating
/etc/rcN.d symlinks is not reliable, you should absolutely mention it in
a comment at the start of your init.d script. Really. If it is not there,
or at least in a very prominent place in your distribution specific README,
it might as well not exist until some sysadmin gets very, very aggravated
with you some day.
In case you are curious, we have database systems installed but don't run their daemons because we want to let users run them if they need MySQL or PostgreSQL but we have absolutely zero interest in running a system database instance due to the hassles of backing it up, creating and managing database users and databases, dealing with space issues, and so on. If one of our users wants PostgreSQL or MySQL, they can deal with all of that themselves (and the space that their database uses comes out of their home directory or some other filesystem that they have access to).
Sidebar: how to disable the PostgreSQL daemon on Ubuntu (8.04)
Change /etc/postgresql/<VER>/main/start.conf to another option besides
auto. The comments in the script will tell you what your other options
are.
On a casual inspection, I can't see any way to do this for MySQL. Get
used to running 'update-rc.d -f mysql remove' after every MySQL
update. Oh, and you need to stop the daemon too, since the Ubuntu
postinst package appears to default to auto-starting the daemon after an
upgrade.
(I would love to be wrong about this.)