2014-09-30
NetworkManager and network device races
When we set up CentOS 7 on our new iSCSI backends we (and by that I mean 'I') left them using NetworkManager because it worked and I generally believe in leaving systems in their standard state when possible. Things have changed since then and we're now in the process of dropping NetworkManager.
The direct reason we're doing this is that we discovered that our
iSCSI backends will not boot reliably with NetworkManager; some of
the time the machines would have one of their iSCSI networks be
unconfigured after the boot finished (we never saw it happen to
both, but I believe it theoretically could). This is obviously
really bad, and on top of it NetworkManager (in the form of nmcli)
would claim that the interface was unavailable and so would refuse
to bring it up (this included if you went through ifup instead
of directly invoking NM).
As far as we can tell, what was happening is that during boot an interface would sometimes bounce; network link would come up, go down, and then come back up in very rapid succession. While NetworkManager got told about all of these events in order, it would sometimes not handle the final 'link up' event; it would configure the interface due to the initial link up event, deconfigure it due to the link down event, and then be completely convinced that the link was still down. Do not pass go, do not collect any network traffic.
(NM's logging clearly demonstrated that it saw the final link up event, and in fact saw it before it claimed to have started the deconfiguration process.)
Having this actually happen to us during boot was bad enough. Worse was the idea that this might happen during operation if link signal bounced then for one reason or another. If NetworkManager could decide to blow up our interfaces at any time if things went badly, it had to go.
I'm not going to question NetworkManager's decision to fully deconfigure a statically configured interface (eg removing its IP address) if the network link goes down, because no doubt it has good reason to do so. But this behavior is what made this bug a fatal one; if NM had left the IP address and so on on the 'down' interface, the only actual consequences would have been that NM was wrong about the state of the interface.
(We haven't reported this bug anywhere, which is another entry.)
2014-09-24
One thing I've come to dislike about systemd
One of the standard knocks against systemd is that it keeps growing and expanding, swallowing more and more jobs and so on. I've come around to feeling that this is probably a problem, but not for the reasons you might expect. The short version is that the growing bits are not facing real competition and thus are not being proven and improved by it.
Say what you like about it, but the core design and implementation of systemd went through a relatively ruthlessly Darwinian selection process on the way to its current success. Or to put it more directly, it won a hard-fought fight against both Upstart and the status quo in multiple Linux distributions. This competition undoubtedly improved systemd and probably forced systemd to get a bunch of things right, and it's given us a reasonable assurance that systemd is probably the best choice today among the init systems that were viable options.
(Among other things, note that a number of well regarded Debian developers spent a bunch of time and effort carefully evaluating the various options and systemd came out on top.)
Unfortunately you cannot say the same thing about the components that systemd has added since then, for example the journal. These have simply not had to face real competition and probably never will (unless the uselessd fork catches on); instead they have been able to tacitly ride along with systemd because when you take systemd you more or less have to take the components. Is the systemd journal the best design possible? Is it the option that would win out if there was a free choice? It might be, but we fundamentally don't know. And to the extent that real competition pushes all parties to improve, well, the systemd journal has probably not had this opportunity. Realistically the systemd journal will probably never have any competition, which means that it's enough for it to merely work well enough and be good enough and it probably won't be pushed to be the best.
(Sure, in theory other people are free to write a better replacement for the journal and drop it in. In practice such a replacement project has exactly one real potential user, systemd itself, and the odds are that said user is not going to be particularly interested in replacing their existing solution with your new one, especially if your new one has a significantly different design.)
I would feel happier about systemd's ongoing habit of growing more and more tentacles if those tentacles faced real standalone competition. As it is, people are effectively being asked to take a slowly increasing amount of systemd's functionality on faith that it is good engineering and a good implementation. It may or may not be good engineering (I have no opinion on that), but I can't help but think that real competition would improve it. If nothing else, real competition would settle doubts.
(For instance, I have a reflexive dubiousness about the systemd journal (and I'm not alone in this). If I knew that outside third parties had evaluated it against other options and had determined that it was the best choice (and that the whole idea was worth doing), I would feel better about things.)
By the way, this possibility for separate and genuine competition is one good reason to stick to a system of loosely coupled pieces as much as possible. I say pieces instead of components deliberately, because people are much more likely to mix and match things that are separate projects than they are to replace a component of a project with something else.
(This too is one area where I've become not entirely happy with systemd. With systemd you have components, not pieces, and so in practice most people are going to take everything together. The systemd developers could change this if they wanted to because in large part this is a cultural thing with technical manifestations, such as do you distribute everything in one source tarball and have it all in one VCS repository.)
Sidebar: this doesn't apply to all systemd components
Note that some of what are now systemd components started out life as separate pieces that proved themselves independently and then got moved into systemd (udev is one example); they don't have this problem. Some components are probably either so simple or so easy to ignore that they don't really matter for this. The systemd journal is not an example of either.
2014-09-18
Ubuntu's packaging failure with mcelog in 14.04
For vague historical reasons we've had the mcelog package in our
standard package set. When we went to build our new 14.04 install
setup, this blew up on us; on installation, some of our machines
would report more or less the following:
Setting up mcelog (100-1fakesync1) ... Starting Machine Check Exceptions decoder: CPU is unsupported invoke-rc.d: initscript mcelog, action "start" failed. dpkg: error processing package mcelog (--configure): subprocess installed post-installation script returned error exit status 1 Errors were encountered while processing: mcelog E: Sub-process /usr/bin/dpkg returned an error code (1)
Here we see a case where a collection of noble intentions have had terrible results.
The first noble intention is a desire to warn people that mcelog
doesn't work on all systems. Rather than silently run uselessly or
silently exit successfully, mcelog instead reports an error and
exits with a failure status.
The second noble intention is the standard Debian noble intention
(inherited by Ubuntu) of automatically starting most daemons on
installation. You can argue that this is a bad idea for things like
database servers, but for basic system monitoring tools like mcelog
and SMART monitoring I think most people actually want this; certainly
I'd be a bit put out if installing smartd didn't actually enable
it for me.
(A small noble intention is that the init script passes mcelog's
failure status up, exiting with a failure itself.)
The third noble intention is that it is standard Debian behavior
for an init script that fails when it is started in the package's
postinstall script to cause the postinstall script itself to exit
out with errors (it's in a standard dh_installinit stanza).
When the package postinstall script errors out, dpkg itself flags
this as a problem (as well it should) and boom, your entire package
install step is reporting an error and your auto-install scripts fall
down. Or at least ours do.
The really bad thing about this is that server images can change
hardware. You can transplant disks from one machine to another for
various reasons; you can upgrade the hardware of a machine but
preserve the system disks; you can move virtual images around; you
can (as we do) have standard machine building procedures that want
to install a constant set of packages without having to worry about
the exact hardware you're installing on. This mcelog package
behavior damages this hardware portability in that you can't safely
install mcelog in anything that may change hardware. Even if the
initial install succeeds or is forced, any future update to mcelog
will likely cause you problems on some of your machines (since a
package update will likely fail just like a package install).
(This is a packaging failure, not an mcelog failure; given that
mcelog can not work on some machines it's installed on, the init
script failure should not cause a fatal postinstall script failure.
Of course the people who packaged mcelog may well not have known
that it had this failure mode on some machines.)
I'm sort of gratified to report that Debian has a bug for this, although the progress of the bug does not fill me with great optimism and of course it's probably important enough to ever make it into Ubuntu 14.04 (although there's also an Ubuntu bug).
PS: since mcelog has never done anything particularly useful for
us, we have not been particularly upset over dropping it from our
list of standard packages. Running into the issue was a bit irritating
though, but mcelog seems to be historically good at irritation.
PPS: the actual problem mcelog has is even more stupid than 'I
don't support this CPU'; in our case it turns out to be 'I need a
special kernel module loaded for this machine but I won't do it for
you'. It also syslogs (but does not usefully print) a message that
says:
mcelog: AMD Processor family 16: Please load edac_mce_amd module.#012: Success
See eg this Fedora bug and this Debian bug. Note that the message really means 'family 16 and above', not 'family 16 only'.
2014-09-07
Systemd's fate will be decided by whether or not it works
I have recently been hearing a bunch of renewed grumbling about systemd, probably provoked by the release of RHEL 7 (with a contributing assist from the Debian decision for it and Ubuntu's decision to go along with Debian). There are calls for a boycott or moving away from systemd-using Linuxes, perhaps to FreeBSD, for example. My personal view is that such things misread the factors that will drive both sides of the decision about systemd, that will sway many people either passively for or actively against it.
What it all comes down to is that operating systems are commodities and this commodification extends to the init system. For most people, the purpose of an OS, a Linux distribution, a method of configuring the network, and an init system is to run your applications and keep your system going without causing you heartburn (ideally all of them will actually help you). For (some) management and organizations, an additional purpose is making things not their fault. Technical specifics are almost always weak influences at best.
(It is worth saying explicitly that this is as it should be. The job of computer systems is to serve the needs of the organization; they can and must be judged on how well they do that. Other factors are secondary. Note that this doesn't mean that other factors are irrelevant; in a commodity market, there may be many solutions that meet the needs and so you can choose among them based on secondary factors.)
This cuts both ways. On the one hand, it means that generally no one is really going to care if you run FreeBSD instead of Linux (or Linux X instead of Linux Y) because you want to, provided that everything keeps working or at most that things are only slightly worse from their perspective. On the other hand, it also means that most sysadmins don't care deeply about the technical specifics of what they're running provided that it works.
You can probably see where this is going. If (and only if) systemd works, most people won't care about it. Most sysadmins are not going to chuck away perfectly good RHEL 7, Debian, or Ubuntu systems on the principle that systemd is icky, especially if this requires them to step down to systems that are less attractive apart from not having systemd. In fact most sysadmins are probably only vaguely aware of systemd, especially if things just work on their systems.
On the other hand, if systemd turns out to make RHEL 7, Debian, or Ubuntu machines unstable we will see a massive pushback and revolt. No amount of blandishment from any of the powers that be can make sysadmins swallow things that give them heartburn; a glowing example of this is SELinux, which Red Hat has been trying to push on people for ages with notable lack of success. If Red Hat et al cannot make systemd work reliably and will not replace it posthaste, people will abandon them for other options that work (be those other Linuxes or FreeBSD). And if systemd works well only in some environments, only people in those environments will have the luxury of ignoring it.
That is why I say that systemd's fate will be decided by whether or not it works. If it does work, inertia means that most sysadmins will accept it because it is part of a commodity that they've already picked for other reasons and they likely don't care about the exact details of said commodity. If it doesn't work, that's just it; people will move to systems that do work in one way or another, because that's the organizational imperative (systems that don't work are expensive space heaters or paperweights).
Sidebar: The devil's advocate position
What I've written is only true in the moderate term. In the long term, systemd's fate is in the hands of both Linux distribution developers in general and the people who can write better versions of what it does. If those people are and remain dissatisfied with systemd, it's very likely to eventually get supplanted and replaced. Call this the oyster pearl story of Linux evolution, where people not so much scratch an itch (in the sense of a need) as scratch an irritation.
The kernel should not generate messages at the behest of the Internet
Here is a kernel message that one of my machines logged recently:
sit: Src spoofed 98.247.192.43/2002:4d4d:4d07::4d4d:4d07 -> 128.100.3.51/2002:8064:333::1
Did I say 'a message'? Actually, no, I meant 493 messages in a few days (and it would be more if I had not used iptables to block these packets). Perhaps you begin to see the problem here. This message shows two failures. The first is that it's not usefully ratelimited. This exact message was repeated periodically, often in relatively close succession and with no intervening messages, yet it was not effectively ratelimited and suppressed.
(The kernel code uses net_warn_ratelimited() but this is
clearly not ratelimited enough.)
The second and more severe failure is the kernel should not default
to logging messages at the behest of the Internet. If you have a
sit tunnel up for 6to4, anyone on the
Internet can flood your kernel logs with their own version of this
message; all they have to do is hand-craft a 6to4 packet with the
wrong IPv6 address. As we've seen here, such packets can probably
even be generated by accident or misconfiguration or perhaps funny
routing. Allow me to be blunt: the kernel should not be handing
this power to people on the Internet. Doing so is a terrible idea
for all of the usual reasons that giving Internet strangers any
power over your machine is a bad idea.
These messages should not be generated by default (at any logging level, because there is no logging level that means 'only log messages that are terrible ideas'). If the kernel wants to generate them, it can and should be controlled by a sysctl or a sysfs option or the like that defaults to off. People who really really want to know can then turn it on; the rest of us will leave it off in our usual great indifference to yet another form of Internet badness.
(Since I haven't been this quite this harsh on kernel messages earlier, I'll admit it: my attitude on kernel messages has probably steadily gotten stricter and more irritated over time. I should probably write down my thoughts on good kernel messages sometime.)
Sidebar: what this message means
A 6to4 encapsulated packet has two addresses; the outside IPv4 address and the inner IPv6 address. The kernel insists that the inner IPv6 address is the IPv4 address's 6to4 address. Here the outside source is 98.247.192.43 but the inner 6to4 address in 2002::/16 is for the IPv4 address 77.77.77.7. You can get a similar message if the destination address has a mismatch between the IPv4 address and the 6to4 IPv6 address.
(To decode the 6to4 IPv6 address, take off the leading 2002: bit and then convert the next four hex octets to decimal bytes; each byte is one digit in the address. So the source is claimed to be 4d.4d.4d.07 aka 77.77.77.7. We can follow the same procedure for the destination address, getting (hex) 80.64.03.33 aka decimal 128.100.3.51, which matches the outer IPv4 address.)