2010-07-28
My Fedora 8 problem: upgrading
My Fedora 8 problem is that I still have a machine running Fedora 8, which means that I need to upgrade it. Worse, this is not some disused machine sitting in the corner but my home workstation; it doesn't have much bandwidth, and I kind of want it to be up and usable as much as possible when I'm home. So I've been gloomily contemplating my upgrade options for some time.
The officially supported or semi-supported way to do this is to do the upgrade from a Fedora 13 install DVD. This will likely take many hours during which my system is unusable and, assuming it works, will then require me to download a gigabyte or two of updates and third partly packages over a relatively slow DSL link before the system is really usable again.
(I am assuming here that the Fedora 13 installer will upgrade a Fedora 8 system; it's possible that it won't touch machines that are that old.)
Now, this machine uses my full workstation partitioning scheme, with duplicate partitions for /, /usr,
and /var. In theory the best way to upgrade is to make a copy of the
system in these partitions, chroot into it, and do a yum upgrade. There are two problems
with this, though. First, I don't know if a yum upgrade works in a
chroot'd environment or if it tries to kill and restart various daemons
at inopportune times and so on; I would not be surprised if this was
neither tested nor recommended. Second, you can't upgrade directly
from Fedora 8 to Fedora 13 this way; you have to upgrade to Fedora 10
and then again to Fedora 11 as intermediate steps. This is a lot of
downloads over my slow DSL link, even if I figure out how to make yum
get as many packages as possible from a local DVD or directory.
(The bandwidth of a DVD or two transported from work vastly exceeds my DSL link.)
I'm pretty sure that I can't put together a version of PreUpgrade that will go from Fedora 8 to Fedora 13 in one operation; certainly, the Fedora 8 version only offers up to Fedora 10 as an option. Using PreUpgrade might cut one step off the yum upgrade process (but might not) and would let me download all of the necessary packages and updates in advance, but it would also have my machine down for many hours again. Twice (at least).
The crazy option is to not upgrade to Fedora 13 but to use those spare partitions to install Fedora 13 from scratch. This would probably require the machine to be down (I expect that Anaconda's live DVD installs still take over the entire machine), but Fedora generally installs much faster than it upgrades. And I would get to start over without four years of accumulated random bits and pieces. The downside of this is that I would really, really want to have good backups of all of my data.
(One of the things I'm taking away from this exercise and a similar although less drastic exercise at work is that next time around, I really want all of my user data on different physical disks than the system disk(s). This would let me completely disconnect them during upgrades and reinstalls so that I don't have to count on the install or upgrade process leaving my user filesystems alone and untouched.)
Finally, at this point it's getting increasingly tempting to 'upgrade' the machine by buying a new one and installing Fedora 13 (and all of my local changes) onto the machine from scratch and copying my data over. But getting a new machine still feels kind of wasteful at this point; while my home machine will be four years old this fall, it's still perfectly good for most of what I do (although I would like more RAM and CPU power for processing digital photos, especially since I have one of the cores turned off due to reliability problems).
2010-07-25
Why sysadmins almost never replace distribution packages
I mentioned this in passing recently; today I feel like elaborating on why replacing a distribution package with your own locally built version of something is a big pain in the rear and thus why sysadmins almost never do it.
(I'm going to assume here that you're familiar with building from source in general.)
First, you have two options for how to do this; you can build from
source and just do a 'make install', or you can actually (re)build a
new package and install it. Doing a raw build from source is horrible
for all of the reasons that we have modern package management, plus
you're going behind the back of your package management system and this
generally doesn't end well.
Rebuilding a package is superficially more attractive, but it causes a number of issues. First, you need to know how to build and rebuild packages for the particular OS that you need the new version of the program on, and to have a build environment suitable for doing this. But let's assume that you can do all of this because you've already invested the time to become a competent package builder for this distribution.
Once you have a package, what you're doing is adding unofficial local packages to the distribution. When you do this, making everything work nicely together becomes your responsibility, and when you override a distribution package instead of adding a new one you also get the headaches of dealing with the distribution's own updates to the package. The distribution may update their version of the package in ways that clash with your version or simply cause your version to be removed, and they may change how the package works in ways that require you to immediately update your own version in order to keep working with the rest of the system.
(In short, you're essentially maintaining a fork of their package and you get to do all of the tracking and updating that that implies.)
In either case you now have to keep track of the upstream version yourself, in order to pick up security issues and (important) bugfixes. If you do not want to lock yourself to using the latest and greatest version, this may include backporting those changes to the older version that you're using. You will probably also want to keep track of the changes that your distribution thinks are important enough to include in their packaged version of the program.
All of this requires more than just work and time; it requires attention (to upstream changes, to distribution changes, to security alerts, etc). Attention is an especially scarce resource for sysadmins, much scarcer than time.
(The one time it starts being worth doing this is when a distribution has hit end of life. In that case, there will be no distribution package changes and the distribution has stopped tracking the upstream for security updates and so on anyways, so either you worry about it or no one will.)
2010-07-24
Thinking about the implications of your program being successful
Recently, some of our Red Hat Enterprise Linux systems started mailing us a message from cron once an hour:
/etc/cron.hourly/mcelog.cron:
mcelog: warning: record length longer than expected. Consider update.
There are two things wrong with this message. The first is a RHEL
bug; Red Hat should have made sure that this cron job didn't bombard
sysadmins with unimportant messages by, for example, redirecting
standard error to /dev/null.
(This is sadly a general problem that RHEL has; there seem to be any number of cron entries that will spray you with email if you let them, and some of them are installed by default.)
The second, though, is that the author of mcelog hasn't thought
through the implications of it being a successful Linux program, which
it clearly is. In Linux, your program is a success if it's important
enough to be packaged by distributions and even more of a success if
they start installing it by default, as has happened with mcelog
on at least RHEL.
When your program is packaged and installed by Linux distributions, ie when it is a success, it is essentially pointless to suggest to people that they install an updated version by hand; it just doesn't happen. Sysadmins won't don't do that unless we absolutely have to, and most other users are just going to look at you blankly at the very suggestion. Once a program comes from a distribution package, that's pretty much it; either the distribution updates it or no one does.
Before mcelog became a success, this message was a perfectly sensible
suggestion; people were building their own copy anyways, so telling
them to update it was fine. But the more successful mcelog became,
the more that people got it from their distribution instead of by
building it themselves, the more the suggestion turned into something
that almost no one was going to follow and thus became useless noise.
(In practice the net effect of this message was to get me to run 'rpm
-e mcelog' on the affected machines.)
2010-07-14
Making the Linux kernel shut up about segfaulting user programs
Relatively modern versions of the Linux kernel on some architectures
default to logging a kernel message about any process that receives
various unhandled signals. The exact details depend on the architecture,
but I think it is common for the architectures to log messages about
unhandled SIGSEGVs, ie segmentation faults.
(Most architectures are capable of logging these messages if the feature is turned on, but not all architectures turn it on by default. It appears that x86 (both 32-bit and 64-bit) and SPARC default to having it on, and PowerPC and S390 default to having it off. SPARC support for this is recent, having only been added around March of this year)
I find these messages rather irritating, given that we run multiuser systems that are full of users running various programs, some of them locally written and some of them just buggy. They clutter up our kernel message logs, making it harder to notice other things, and there is no useful information for us in them.
(The kernel message is ratelimited so that it can't flood the logs, but given the relatively low volume of kernel messages in general it can easily be the dominating message.)
On all architectures that support this, whether it is on or
off is controlled by the debug.exception-trace sysctl (aka
/proc/sys/debug/exception-trace); a value of 1 means that the kernel
logs messages, a 0 means that it doesn't. On the S390, there is a second
sysctl that also controls it, kern.userprocess_debug, which is
presumably still there for historical reasons.
(This is the kind of entry that I write so that I can find it later.)
Sidebar: the kernel code itself
The kernel sorce code variable that controls this behavior is
show_unhandled_signals. It's almost entirely used (and defined) in
architecture dependent code, which is why it has different defaults and
behaviors on different architectures. There is one conditional bit in
general kernel code, in kern/sysctl.c, to define the sysctl itself.