2015-02-25
My current issues with systemd's networkd in Fedora 21
On the whole I'm happy with my switch to systemd-networkd, which
I made for reasons covered here; my networking
works and my workstation boots faster. But right now there are some
downsides and limitations to networkd, and in the interests of equal
time for the not so great bits I feel like running down them today.
I covered some initial issues in my detailed setup entry; the largest one is that there is no syntax
checker for the networkd configuration files and networkd itself
doesn't report anything to the console if there are problems.
Beyond that we get into a collection of operational issues.
What I consider the largest issue with networkd right now is that
it's a daemon (as opposed to something that runs once and stops)
but there is no documented way of interacting with it while it's
running. There are two or three sides to this: information, temporary
manipulation, and large changes. On the information front, networkd
exposes no good way to introspect its full running state, including
what network devices it's doing what to, or to wait for it to
complete certain operations. On the temporary manipulation front,
there's no way I know of to tell networkd to temporarily take down
something and then later bring it back (the equivalent of ifdown
and ifup). Perhaps you're supposed to do those with manual commands
outside of networkd. Finally, on more permanent changes, if you add
or remove or modify a configuration file in /etc/systemd/network
and want networkd to notice, well, I don't know how you do that.
Perhaps you restart networkd; perhaps you shut networkd down, modify
things, and restart it; perhaps you reboot your machine. Perhaps
networkd notices some changes on its own.
(Okay, it turns out that there's a networkctl command
that queries some information from networkd, although it's not
actually documented in the Fedora 21 version of systemd. This still
doesn't allow you to poke networkd to do various operations.)
This points to a broader issue: there's a lot about networkd that's
awfully underdocumented. I should not have to wonder about how to
get networkd to notice configuration file updates; the documentation
should tell me one way or another. As I write this the current
systemd 219 systemd-networkd manpage is
a marvel of saying very litte, and there's also omissions and lack
of clarity in the manpages for the actual configuration files. All
told networkd's documentation is not up to the generally good systemd
standards.
The next issue is that networkd has forgotten everything that systemd
learned about the difference between present configuration files
and active configuration files. To networkd those are one and the
same; if you have a file in /etc/systemd/network, it is live.
Want it not to be live? Better move it out of the directory (or
edit it, although there is no explicit 'this is disabled' option
you can set). Want to override something in /usr/lib/systemd/network?
I'm honestly not sure how you'd do that short of removing it or
editing it. This is an unfortunate step backwards.
(It's also a problem in some situations where you have multiple
configurations for a particular port that you want to swap between.
In Fedora's static configuration world you can have multiple ifcfg-*
files, all with ONBOOT=no, and then ifup and ifdown them as
you need them; there is no networkd equivalent.)
I'm not going to count networkd's lack of general support for 'wait
for specific thing <X> to happen' as an issue. But it certainly
would be nice if systemd-networkd-wait-online was more generic
and so could be more easily reused for various things.
I do think (as mentioned) that some of
networkd's device and link configuration is unnecessarily tedious
and repetitive. I see why it happened, but it's the easy way instead
of the best way. I hope that it can be improved and I think that
it can be. In theory I think you could go as far as optionally
merging .link files with
.network files to
cover many cases much simpler, as the sections in each file today
basically don't clash with each other.
In general I certainly hope that all of these issues will get better over time, although some of them will inevitably make networkd more complicated. Systemd's network configuration support is relatively young and I'm willing to accept some rough edges under the circumstances. I even sort of accept that networkd's priority right now probably needs to be supporting more types of networking instead of improving the administration experience, even if it doesn't make me entirely happy (but I'm biased, as my needs are already met there).
(To emphasize, my networkd issues are as of the state of networkd in Fedora 21, which has systemd 216, with a little bit of peeking at the latest systemd 219 documentation. In a year the situation may look a lot different, and I sure hope it does.)
My Linux container temptation: running other Linuxes
We use a very important piece of (commercial) software that is only supported on Ubuntu 10.04 and RHEL/CentOS 6, not anything later (and it definitely doesn't work on Ubuntu 12.04, we've tried that). It's currently on a 10.04 machine but 10.04 is going to go out of support quite soon. The obvious alternative is to build a RHEL 6 machine, except I don't really like RHEL 6 and it would be our sole RHEL 6 host (well, CentOS 6 host, same thing). All of this has led me to a temptation, namely Linux containers. Specifically, using Linux containers to run one Linux as the host operating system (such as Ubuntu 14.04) while providing a different Linux to this software.
(In theory Linux containers are sort of overkill and you could do most or all of what we need in a chroot install of CentOS 6. In practice it's probably easier and surer to set up an actual container.)
Note that I specifically don't want something like Docker, because the Docker model of application containers doesn't fit how the software natively works; it expects an environment with cron and multiple processes and persistent log files it writes locally and so on and so forth. I just want to provide the program with the CentOS 6 environment it needs to not crash without having to install or actually administer a CentOS 6 machine more than a tiny bit.
Ubuntu 14.04 has explicit support for LXC with documentation and appears to support CentOS containers, so that's the obvious way to go for this. It's certainly a tempting idea; I could play with some interesting new technology while getting out of dealing with a Linux that I don't like.
On the other hand, is it a good idea? This is certainly a lot of work to go to in order to avoid most of running a CentOS 6 machine (I think we'd still need to watch for eg CentOS glibc security updates and apply them). Unless we make more use of containers later, it would also leave us with a unique and peculiar one-off system that'll require special steps to administer. And virtualization has failed here before.
(I'd feel more enthused about this if I thought we had additional good uses for containers, but I don't see any other ones right now.)
2015-02-21
It turns out that I routinely use some really old Linux binaries
With all of the changes that Linux has gone through over the past ten or fifteen years and with them all of the things that have stopped working, it's easy to wind up feeling that Linux doesn't have a really good story about backwards compatibility. Certainly this is true to some degree and I have various programs that have broken over this time, sometimes to my significant irritation. But at the same time it turns out that some parts of the Linux binary world have been remarkably stable.
How stable? Well, let me give you two stories.
The oldest Linux binary that I'm sure I use on a routine basis was
compiled on January 29th 1998. This was almost certainly very shortly
after I installed my very first Linux machine, as the source code
itself is older than that (it's a standard helper for my dotfiles
that dates back to at least 1995). I've faithfully carried my $HOME
forward from then onwards and with it this program, which has just
kept on working. It's a 32-bit program of course, dynamically linked
against glibc, and strings suggests it was compiled with GCC
2.7.2.3.
(I know it was compiled on Red Hat Linux, and given the date it would have been Red Hat 5.0.)
The second impressively old binary that I still use regularly is an X-based program that was compiled on June 12th 2000. As an X program it's dynamically linked against not just glibc but a whole series of additional X libraries. All of them have kept ABI compatibility and have not changed their .so versions. In fact now that I look I see that I routinely use an even older X-based program, compiled May 5th 1999, which is actually a core part of my automation to do things with the current X selection.
(My personal binaries directory is overgrown and many of the contents are utility programs used by scripts, so I don't always remember off the top of my head which programs are still in use by scripts that I still use. I really should weed it out a lot, but that would take time and energy.)
There are plenty of Linux binaries that would and did not survive that long. Offhand, anything written in C++ (due to repeated C++ ABI shifts), anything using termcap and (n)curses (due to .so version changes), and anything using Berkeley DB (ditto) would have been lost at some point. And of course many high level GUI toolkits are hopeless; ABI compatibility is nil over time and distributions just don't carry old versions forward in compatibility packages. But apparently basic X is just low level enough that it hasn't been impacted.
(It turns out I have binaries from 2000 that use libXaw, the old Athena widget set. Once I actually fetched the 32-bit libXaw for Fedora 21, they still ran. I guess no one's been fiddling around with Athena widgets any more than they've been meddling with the core X libraries.)
2015-02-07
Trying to move towards Ed25519 OpenSSH host keys: a stumbling block
Now that I've upgraded to Fedora 21 on my main machines and actually
have it available, I've decided to start shifting my OpenSSH
usage towards considering ed25519 my primary and preferred public
key authentication system (out of all of the ones that OpenSSH
offers). Moving towards Ed25519 for my
own keypairs was and is simple; I generated some new keypairs
(one for each master machine), loaded them into ssh-agent first, and started adding them to
authorized_keys on machines that have a modern enough version of
OpenSSH. I expect to be using RSA keys for a long time given that
eg CentOS 7 is not Ed25519 enabled, but at least this transition
is basically automatic from now onwards.
(Well, I can add my Ed25519 keys to authorized_keys pretty much
anywhere but they won't do me any good except on modern machines.)
But I also want to use Ed25519 host keys where machines have them
available, and this turns out to be more tricky than I was expecting
(at least on the version of OpenSSH on Fedora 21, which is OpenSSH
6.6.1p1). If you read the ssh_config manpage you'll soon run
across a description of the HostKeyAlgorithms option, which to
quote the manual 'specifies the protocol version 2 host key algorithms
that the client wants to use in order of preference'. This sounds like
just the thing; I could specify it explicitly in .ssh/config with
the Ed25519 options first and everything would work right.
Well, sadly, no. The manual also claims:
If hostkeys are known for the destination host then [the
HostKeyAlgorithms] default is modified to prefer their algorithms.
In other words: if you say that you prefer Ed25519 keys, the host
has both an Ed25519 key and an RSA key available, and you already
have the host's RSA key in .ssh/known_hosts, ssh should
authenticate the host with the existing RSA key.
This appears to be what they call 'inoperative' if you specify an
explicit HostKeyAlgorithms setting, even if this setting just
shuffles the priority order. If you put the Ed25519 options first
and you know an RSA key for a host that also offers Ed25519, ssh
complains about a host key mismatch between the Ed25519 key being
offered and the RSA key you know and does various things in reaction
(including turning off X forwarding, which is fatal in my environment
at work).
As far as I can tell, the only way to get this to really work is
to not set HostKeyAlgorithms. Instead, you have to manually gather
Ed25519 keys for anything that has them (perhaps using 'ssh-keyscan
-t ed25519'), add them to your known_hosts, and purge any other
host key entries for those hosts. This works right in that anything
with a known Ed25519 key will be verified against the key, but it
won't remember Ed25519 keys for new hosts; instead you'll probably
wind up with ECDSA keys for them. You'll want to periodically look
for signs that new hosts support Ed25519 and upgrade your known
host keys for them.
(Future versions of OpenSSH will apparently support recording multiple host keys for a remote host at once as part of host key rotation. That won't be on all of our Linux machines any time soon, much less other things in the corner.)
I'd say that hopefully this issue is or will be fixed in future versions of OpenSSH, but I'm honestly a little worried that people will say it's actually working as intended (and at most fix the manual page). Even if they do fix it I don't expect to see the fix appearing in Fedora any time soon, given the usual release process and delays.
(It's possible that setting HostKeyAlgorithms in /etc/ssh/ssh_config
will work and that only .ssh/config is special here, but I
haven't tested this, it's not always feasible, and I'm not holding
my breath.)