2015-04-24
A DKMS problem I had with lingering old versions
I use the DKMS-based approach for my ZFS on Linux install, fundamentally because using DKMS makes
upgrading kernels painless and convenient. It's worked well for a
long time, but recently some DKMS commands, particularly 'dkms
status', started erroring out with the odd message:
Error! Could not locate dkms.conf file. File: does not exist.
Since everything seemed to still work I shrugged my shoulders and
basically ignored it. I don't know DKMS myself; as far as I've been
concerned, it's just as much magic as, oh, /bin/kernel-install
(which, if you're not familiar with it, is what Fedora runs to set
up new kernels). I did a little bit of Internet searching for the
error message but turned up nothing that seemed particularly relevant.
Then today I updated to a new Fedora kernel, got this message, and
in an excess of caution decided to make sure that I actually had
the ZoL binary modules built and installed for the new kernel.
Well, guess what? I didn't. Nor could I force them to be built for
the new kernel; things like 'dkms install ...' kept failing with
this error message or things like it.
(I felt very happy about checking before I rebooted the system into the new kernel and had it come up without my ZFS pools.)
I will cut to the chase. ZFS on Linux recently released version 0.6.4, when I had previously been running development versions that still called themselves 0.6.3 for DKMS purposes. When I upgraded to 0.6.4, something in the whole process left behind some 0.6.3 directory hierarchies in a DKMS area, specifically /var/lib/dkms/spl/0.6.3 and /var/lib/dkms/zfs/0.6.3. Removing these lingering directory trees made DKMS happy with life and allowed me to eventually build and install the 0.6.4 SPL and ZFS modules for the new kernel.
(The dkms.conf file(s) that DKMS was looking for are normally found in /usr/src/<pkg>-<ver>. My theory is that the lingering directories in /var/lib/dkms were fooling DKMS into thinking that spl and zfs 0.6.3 were installed, and then it couldn't find their dkms.conf files under /usr/src and errored out.)
I have no idea if this is a general DKMS issue, something that I
only ran into because of various somewhat eccentric things I wound
up doing on my machine, or some DKMS related thing that the ZoL
packages are doing slightly wrong (which has happened before). At
least I've solved it now and 'dkms status' is now happy with life.
(I can't say I've deeply increased my DKMS knowledge in the process. DKMS badly needs a 'so you're a sysadmin and something has gone wrong with a DKMS-based package, here's what you do next' document. Also, this is obviously either a bad or a buggy error message.)
2015-04-10
I wish systemd would get over its thing about syslog
Anyone who works with systemd soon comes to realize that systemd just doesn't like syslog very much. In fact systemd is so unhappy with syslog that it invented its own logging mechanism (in the form of journald). This is not news. What people who don't have to look deeply into the situation often don't realize is that systemd's dislike is sufficiently deep that systemd just doesn't interact very well with syslog.
I won't say that bugs and glitches 'abound', because I've only run into two issues so far (although both issues are relatively severe). One was that systemd mis-filed kernel messages under the syslog 'user' facility instead of the 'kernel' one; this bug made it past testing and into RHEL 7 / CentOS 7. The other is that sometimes on boot, randomly, systemd will barf up a significant chunk of old journal messages (sometimes very old) and re-send them to syslog. If you don't scroll back far enough while watching syslog logs, this can lead you to believe that something really bad and weird has happened.
(This has actually happened to me several times.)
This is stupid and wrongheaded on systemd's part. Yes, systemd doesn't like syslog. But syslog is extremely well established and extremely useful, especially in the server space. Part of that is historical practice, part of that is that syslog is basically the only cross-platform logging technology we have, and partly it's because you can do things like forward syslog to other machines, aggregate logs from multiple machines on one, and so on (and do so in a cross-platform way). And a good part of it is because syslog is simple text and it's always been easy to do a lot of powerful ad-hoc stuff with text. That systemd continually allows itself to ignore and interact badly with syslog makes everyone's life worse (except perhaps the systemd authors). Syslog is not going away just because the systemd authors would like it to and it is high time that systemd actually accepted that and started not just sort of working with syslog but working well with it.
One of systemd's strengths until now has been that it played relatively well (sometimes extremely well) with existing systems, warts and all. It saddens me to see systemd increasingly throw that away here.
(And I'll be frank, it genuinely angers me that systemd may feel that it can get away with this, that systemd is now so powerful that it doesn't have to play well with other systems and with existing practices. This sort of arrogance steps on real people; it's the same arrogance that leads people to break ABIs and APIs and then tell others 'well, that's your problem, keep up'.)
PS: If systemd people feel that systemd really does care about syslog and does its best to work well with it, well, you have two problems. The first is that your development process isn't managing to actually achieve this, and the second is that you have a perception problem among systemd users.
2015-04-09
Probably why Fedora puts their release version in package release numbers
Packaging schemes like RPM and Debian debs split full package names
up into three components: the name, the (upstream) version, and the
(distribution) release of the package. Back when people started
making RPM packages, the release component tended to be just a
number, giving you full names like liferea-1.0.9-1 (this is
release 1 of Liferea 1.0.9). As I mentioned recently, the modern practice of Fedora release
numbers has changed to include the distribution version. Today we
have liferea-1.10.13-1.fc21 instead (on Fedora 21, as you can
see). Looking at my Fedora systems, this appears to be basically
universal.
Before I started writing this entry and really thinking about the
problem, I thought there was a really good deep reason for this.
However, now I think it's so that if you're maintaining the same
version of a package on both Fedora 20 and Fedora 21, you can use
the exact same .spec file. As an additional reason, it makes
automated rebuilds of packages for (and in) new Fedora versions
easier and work better for upgrades (in that someone upgrading
Fedora versions will wind up with the new version's packages).
The simple magic is in the .spec file:
Release: 1%{?dist}
The RPM build process will substitute this in at build time with the
Fedora version you're building on (or for), giving you release numbers
like 1.fc20 and 1.fc21. Due to this substitution, any RPM .spec file
that does releases this way can be automatically rebuilt on a new Fedora
version without needing any .spec file changes (and you'll still get a
new RPM version that will upgrade right, since RPM sees 1.fc21 as being
more recent than 1.fc20).
The problem that this doesn't really deal with (and I initially thought it did) is wanting to build an update to the Fedora 20 version of a RPM without updating the Fedora 21 version. If you just increment the release number of the Fedora 20 version, you get 2.fc20 and the old 1.fc21 and then upgrades won't work right (you'll keep the 2.fc20 version of the RPM). You'd have to change the F20 version to a release number of, say, '1.fc20.1'; RPM will consider this bigger than 1.fc20 but smaller than 1.fc21, so everything works out.
(I suspect that the current Fedora answer here is 'don't try to do just a F20 rebuild; do a pointless F21 rebuild too, just don't push it as an update'. Really there aren't many situations where you'd need to do a rebuild without any changes in the source package, and if you change the source package, eg to add a new patch, you probably want to do a F21 update too. I wave my hands.)
PS: I also originally thought that Ubuntu does this too, but no; while Ubuntu embeds 'ubuntu' in a lot of their package release numbers, it's not specific to the Ubuntu version involved and any number of packages don't have it. I assume it marks packages where Ubuntu deviates from the upstream Debian package in some way, eg included patches and so on.
2015-04-07
How Ubuntu and Fedora each do kernel packages
I feel the need to say things about the Ubuntu (and I believe Debian)
kernel update process, but before I do that I want to write down
how kernel packages look on Ubuntu and Fedora from a sysadmin's
perspective because I think a number of people have only been exposed
to one or the other. The Fedora approach to kernel packages is also
used by Red Hat Enterprise Linux (and CentOS) and probably other
Linux distributions that use yum and RPMs. I believe that the
Ubuntu approach is also used by Debian, but maybe Debian does it a
bit differently; I haven't run a real Debian system.
Both debs and RPMs have the core concepts of a package having a
name, an upstream version number, and a distribution release number.
For instance, Firefox on my Fedora 21 machine is currently firefox,
upstream version 37.0, and release 2.fc21 (increasingly people
embed the distribution version in the release number for reasons
beyond the scope of this entry).
On Fedora you have some number of kernel-... RPMs installed at
once. These are generally all instance of the kernel package (the
package name); they differ only in their upstream version number and
their release number. Yum normally keeps the most recent five of
them for you, deleting the oldest when you add a new one via a 'yum
upgrade' when a new version of the kernel package is available.
This gives you a list of main kernel packages that looks like this:
kernel-3.18.8-201.fc21.x86_64 kernel-3.18.9-200.fc21.x86_64 kernel-3.19.1-201.fc21.x86_64 kernel-3.19.2-201.fc21.x86_64 kernel-3.19.3-200.fc21.x86_64
Here the kernel RPM with upstream version 3.19.3 and Fedora
release version 200.fc21 is the most recent kernel I have installed
(and this is a 64-bit machine as shown by the x86_64 architecture).
(This is a slight simplification. On Fedora 21, the kernel is actually
split into three kernel packages: kernel, kernel-core, and
kernel-modules. The kernel package for a specific version is just
a meta-package that depends (through a bit of magic) on its associated
kernel-core and kernel-modules packages. Yum knows how to manage all
of this so you keep five copies not only of the kernel meta-package
but also of the kernel-core and kernel-modules packages and so on.
Mostly you can ignore the sub-packages in Fedora; I often forget about
them. In RHEL up through RHEL 7, they don't exist and their contents are
just part of the kernel package; the same was true of older Fedora
versions.)
Ubuntu is more complicated. There is a single linux-image-generic
(meta-)package installed on your system and then some number of
packages with the package name of
linux-image-<version>-<release>-generic for various <version> and
<release> values. Each of these packages has a deb upstream version
of <version> and a release version of <release>.<number>, where the
number varies depending on how Ubuntu built things. Each specific
linux-image-generic package version depends on a particular
linux-image-<v>-<r>-generic package, so when you update to it it
pulls in that specific kernel (at whatever the latest package release
of it is).
Because of all of this, Ubuntu systems wind up with multiple kernels
installed at once by the side effects of updating linux-image-generic.
A new package version of l-i-g will depend on and pull in an
entirely new linux-image-<v>-<r>-generic package, leaving the old
linux-image-*-generic packages just sitting there. Unlike with yum,
nothing in plain apt-get limits how many old kernels you have sitting
around; if you leave your server alone, you'll wind up with copies of
all kernel packages you've ever used. As far as the Ubuntu package
system sees it, these are not multiple versions of the same thing but
entirely separate packages, each of which you have only one version
of.
This gives you a list of packages that looks like this (splitting
apart the package name and the version plus Ubuntu release, what
'dpkg -l' calls Name and Version):
linux-image-3.13.0-24-generic 3.13.0-24.47 linux-image-3.13.0-45-generic 3.13.0-45.74 linux-image-3.13.0-46-generic 3.13.0-46.79 linux-image-3.13.0-48-generic 3.13.0-48.80 linux-image-generic 3.13.0.48.55
(I'm simplifying again; on Ubuntu 14.04 there are also
linux-image-extra-<v>-<r>-generic packages.)
On this system, the current 3.13.0.48.55 version of linux-image-generic
depends on and thus requires the linux-image-3.13.0-48-generic
package, which is currently 'at' the nominal upstream version
3.13.0 and Ubuntu release 48.80. Past Ubuntu versions of
linux-image-generic depended on the other linux-image-*-generic
packages and caused them to be installed at the time.
I find the Fedora/RHEL approach to be much more straightforward than the Ubuntu approach. With Fedora, you just have N versions of the kernel package installed at once; done. With Ubuntu, you don't really have multiple versions of any given package installed; you just have a lot of confusingly named packages, each of which has one version installed, and these packages get installed on your system as a side effect of upgrading another package (linux-image-generic). As far as I know the Ubuntu package system doesn't know that all of these different named packages are variants of the same thing.
(A discussion of some unfortunate consequences of this Ubuntu decision is beyond the scope of this entry. See also.)
Sidebar: kernel variants
Both Ubuntu and Fedora have some variants of the kernel; for instance,
Fedora has a PAE variant of their 32-bit x86 kernel. On Fedora, these
get a different package name, kernel-pae, and everything else works in
the same way as for normal kernels (and you have both PAE and regular
kernels installed at the same time; yum will keep the most recent five
of each).
On Ubuntu I believe these get a different meta-package that replaces
linux-image-generic, for example linux-image-lowlatency, and
versions of this package depend on specific kernel packages with
different names, like linux-image-<v>-<r>-lowlatency. You can see
the collection with 'apt-cache search linux-image'.
Both Fedora and Ubuntu have changed how they handled kernel variants over time; my memory is that Ubuntu had to change more in order to become more sane. Today their handling of variants strikes me as reasonably close to each other.
2015-04-04
An important note if you want to totally stop an IKE IPSec connection
Suppose, hypothetically, that you think your IPSec GRE tunnel may be contributing to some weird connection
problem you're having. In order to get it out
of the picture, you want to shut it down (which will still leave
you able to reach things). There
are three ways you can do this: you can use 'ipsec whack --terminate'
to ask your local pluto to shut down this specific IKE connection
(which you've engineered to stop the GRE tunnel), you can shut your
local pluto down entirely with 'systemctl stop pluto' (or
equivalent), or you can stop pluto on both ends.
I will skip to the punchline: if you have no *protoport set (so
that you're doing IPSec on all traffic just because you might as
well), you need to shut pluto down on both ends. Merely shutting
down the IKE IPSec stuff for your GRE tunnel (and taking down the
tunnel itself) will leave the overall IPSec security policy intact
and this policy specifically instructs the kernel to drop any
non-IPSec packets between your left and right IPs. Only shutting
down pluto itself will get rid of the security policy, and you
need to get rid of it on both ends so you need to shut down pluto
on both.
(If pluto is handling more than one connection for you on one of
the ends, you're going to need to do something more complicated.
My situation is usefully simple here.)
If you shut down pluto on only one end and then keep trying to
test things, you can get into very puzzling and head-scratching
problems. For instance, if you try to make a connection from the
shut-down side to the side with pluto still running, tcpdump
on both ends will tell you that SYN packets are being send and
arriving at their destination but are getting totally ignored despite
there being no firewall rules and so on that would do this.
(If you have a selective *protoport set, any traffic that would
normally be protected by IPSec will be affected by this because the
security policy says 'drop any of this traffic that is not protected
with IPSec'.)
PS: your current IPSec security policies can be examined with
'setkey -DP'. There's probably some way to get a counter of how
many packets have been dropped for violating IPSec security policies,
but I don't know what it is (maybe it's hiding somewhere in 'ip
xfrm', which has low-level details of this stuff, although
/proc/net/xfrm_stat doesn't seem to be it).
A weird new IKE IPSec problem that I just had on Fedora 21's latest kernel
Back when I first wrote up my IKE configuration for my point to point GRE tunnel, I restricted the IKE IPSec configuration so that it would only apply IPSec to the GRE traffic with:
conn cksgre [...] leftprotoport=gre rightprotoport=gre [...]
I only did this restriction out of caution and matching my old
manual configuration. A while later I decided that it was a little
silly; although I basically didn't do any unencrypted traffic to
the special GRE touchdown IP address I use at the work end, I might
as well fully protect the traffic since it was basically free. So
I took the *protoport restrictions out, slightly increasing my
security, and things worked fine for quite some time.
Today this change quietly blew up in my face. The symptoms were that often (although not always) a TCP connection specifically between my home machine and the GRE touchdown IP would stall after it transferred some number of bytes (it's possible that the transfer direction mattered but I haven't tested extensively). Once I narrowed down what was going on from the initial problems I saw, reproduction was pretty consistent: if I did 'ssh -v touchdown-IP' from home I could see it stall during key exchange.
I don't know what's going on here, but it seems specific to running the latest Fedora 21 kernel on both ends; I updated my work machine to kernel 3.19.3-200.fc21 a couple of days ago and did not have this problem, but I updated my home machine to 3.19.3-200.fc21 a few hours ago and started seeing this almost immediately (although it took some time and frustration to diagnose just what the problem was).
(I thought I had some evidence from tcpdump output but in retrospect I'm not sure it meant what I think it meant.)
(I had problems years ago with MTU collapse in the face of recursive GRE tunnel routing, but that was apparently fixed back in 2012 and anyways this is kind of the inverse of that problem, since this is TCP connections flowing outside my GRE tunnel. Still, it feels like a related issue. I did not try various ways of looking at connection MTUs and so on; by the time I realized this was related to IPSec instead of other potential problems it was late enough that I just wanted the whole thing fixed.)
2015-03-31
Btrfs's mistake in limiting itself to two-way mirroring
Recently, I tweeted:
That btrfs still will not do more than two-way mirroring immediately disqualifies it for many serious uses as far as I'm concerned.
On the surface this may sound like a silly limitation to be annoyed at btrfs over, something that only a small number of people playing in the enterprisy (over-)cautious, cost is no object world will ever use. Two way mirrors are pretty reliable, after all, and almost no one actually uses more than two-way mirroring (and the people who do may not be entirely sensible).
This is too small a view of the situation. The problem with having a maximum of two-way mirroring is not steady state operation, it's when you're migrating storage from one disk to another (or from one set of disks to another). Supporting three (or more) way mirroring makes it simple to do this while preserving full redundancy; you attach the new disk as a third mirror, wait for things to resynchronize, and then detach the old disk. If things go wrong with the new disk during this process, no sweat, your old disks are still there and working away as normal.
At this point some people may suggest 'rebalancing' operations, where you attach the third disk and then tell your sophisticated filesystem to change the system by moving all the data from the old disk to the new disk; I believe that btrfs supports this by adding the new disk then deleting the old disk. The problem is that this is not good enough because if things go wrong it will generally leave part of your data non-redundant (whatever data has been migrated to the new disk). It's strictly better to run the new disk in parallel with the old disks and then decide that you trust it enough to drop the old disk out, and that requires real multi-way mirroring.
What btrfs does if you give it more than two disks in a raid-1 setup is actually potentially useful behavior (it mirrors each piece of data on two out of three drives, giving you more disk space). But the right solution here would be to support both this and a way to tell btrfs that you want N-way mirroring instead of just 2-way mirroring. As it is, only having two-way mirroring is yet another reason why I may never use btrfs on my own machines.
(I think that this is an important feature for home machines, with are both the machines most likely to see drive replacements over time and the place where overall drive systems may be the flakiest. You just know that someday someone is going to attach a dubious USB 3.0 external drive to their home system temporarily in order to swap internal drives, with predictable results partway through.)
(Of course, this sort of artificial limitation in btrfs's RAID support is partly fallout from what I feel is btrfs's core mistake.)
2015-03-26
Why systemd should have ignored SysV init script LSB dependencies
In his (first) comment on my recent entry on program behavior and bugs, Ben Cotton asked:
Is it better [for systemd] to ignore the additional [LSB dependency] information for SysV init scripts even if that means scripts that have complete information can't take advantage of it?
My answer is that yes, systemd should have ignored the LSB dependency information for System V init scripts. By doing so it would have had (or maintained) the full System V init compatibility that it doesn't currently have.
Systemd has System V init compatibility at all because it is and was absolutely necessary for systemd to be adopted. Systemd very much wants you to do everything with native systemd unit files, but the systemd authors understood that if systemd only supported its own files, there would be a massive problem; any distribution and any person that wanted to switch to systemd would have to rewrite every SysV init script they had all at once. To take over from System V init at all, it was necessary for systemd to give people a gradual transition instead of a massive flag day exercise. However, the important thing is that this was always intended as a transition; the long run goal of systemd is to see all System V init scripts replaced by units files. This is the expected path for distributions and systems that move to systemd (and has generally come to pass).
It was entirely foreseeable that some System V init scripts would have inaccurate LSB dependency information, especially in distributions that have previously made no use of it. Supporting LSB dependencies in existing SysV init scripts is not particularly important to systemd's long term goals because all of those scripts are supposed to turn into units files (with real and always-used dependency information). In the short term, this support allows systemd to boot a system that uses a lot of correctly written LSB init scripts somewhat faster than it would otherwise have, at the cost of adding a certain amount of extra code to systemd (to parse the LSB comments et al) and foreseeably causing a certain amount of existing init scripts (and services) with inaccurate LSB comments to malfunction in various ways.
(Worse, the init scripts that are likely to stick around the longest are exactly the least well maintained, least attended, most crufty, and least likely to be correct init scripts. Well maintained packages will migrate to native systemd units relatively rapidly; it's the neglected ones or third-party ones that won't get updated.)
So, in short: by using LSB dependencies in SysV init script comments, systemd got no long term benefit and slightly faster booting in the short term on some systems, at the cost of extra code and breaking some systems. It's my view that this was (and is) a bad tradeoff. Had systemd ignored LSB dependencies, it would have less code and fewer broken setups at what I strongly believe is a small or trivial cost.
2015-03-23
Systemd is not fully backwards compatible with System V init scripts
One of systemd's selling points is that it's backwards compatible
with your existing System V init scripts, so that you can do a
gradual transition instead of having to immediately convert all of
your existing SysV init scripts to systemd .service files. For
the most part this works as advertised and much of the time it works.
However, there are areas where systemd has chosen to be deliberately
incompatible with SysV init scripts.
If you look at some System V init scripts, you will find comment blocks at the start that look something like this:
### BEGIN INIT INFO # Provides: something # Required-Start: $syslog otherthing # Required-Stop: $syslog [....] ### END INIT INFO
These are a LSB standard for declaring various things about your init scripts, including start and stop dependencies; you can read about them here or here, no doubt among other places.
Real System V init ignores all of these because all it does is run init scripts in strictly sequential ordering based on their numbering (and names, if you have two scripts at the same numerical ordering). By contrast, systemd explicitly uses this declared dependency information to run some SysV init scripts in parallel instead of in sequential order. If your init script has this LSB comment block and declares dependencies at all, at least some versions of systemd will start it immediately once those dependencies are met even if it has not yet come up in numerical order.
(CentOS 7 has such a version of systemd, which it labels as 'systemd 208' (undoubtedly plus patches).)
Based on one of my sysadmin aphorisms,
you can probably guess what happened next: some System V init scripts
have this LSB comment block but declare incomplete dependencies.
On a real System V init script this does nothing and thus is easily
missed; in fact these scripts may have worked perfectly for a decade
or more. On a systemd system such as CentOS 7, systemd will start
these init scripts out of order and they will start failing, even
if what they depend on is other System V init scripts instead of
things now provided directly by systemd .service files.
This is a deliberate and annoying choice on systemd's part, and I maintain that it is the wrong choice. Yes, sure, in an ideal world the LSB dependencies would be completely correct and could be used to parallelize System V init scripts. But this is not an ideal world, it is the real world, and given that there's been something like a decade of the LSB dependencies being essentially irrelvant it was completely guaranteed that there would be init scripts out there that mis-declared things and thus that would malfunction under systemd's dependency based reordering.
(I'd say that the systemd people should have known better, but I
rather suspect that they considered the issue and decided that it
was perfectly okay with them if such 'incorrect' scripts broke.
'We don't support that' is a time-honored systemd tradition, per
say separate /var filesystems.)
2015-03-22
I now feel that Red Hat Enterprise 6 is okay (although not great)
Somewhat over a year ago I wrote about why I wasn't enthused about RHEL 6. Well, it's a year later and I've now installed and run a CentOS 6 machine for an important service that requires it, and as a result of that I have to take back some of my bad opinions from that entry. My new view is that overall RHEL 6 makes an okay Linux.
I haven't changed the details of my views from the first entry. The installer is still somewhat awkward and it remains an old-fashioned transitional system (although that has its benefits). But the whole thing is perfectly usable; both installing the machine and running it haven't run into any particular roadblocks and there's a decent amount to like.
I think that part of my shift is all of our work on our CentOS 7 machines has left me a lot more familiar with both NetworkManager and how to get rid of it (and why you want to do that). These days I know to do things like tick the 'connect automatically' button when configuring the system's network connections during install, for example (even though it should be the default).
Apart from that, well, I don't have much to say. I do think that we made the right decision for our new fileserver backends when we delayed them in order to use CentOS 7, even if this was part of a substantial delay. CentOS 6 is merely okay; CentOS 7 is decently nice. And yes, I prefer systemd to upstart.
(I could write a medium sized rant about all of the annoyances in the installer, but there's no point given that CentOS 7 is out and the CentOS 7 one is much better. The state of the art in Linux installers is moving forward, even if it's moving slowly. And anyways I'm spoiled by our customized Ubuntu install images, which preseed all of the unimportant or constant answers. Probably there is some way to do this with CentOS 6/7, but we don't install enough CentOS machines for me to spend the time to work out the answers and build customized install images and so on.)