Even systemd services and dependencies are not self-documenting

October 10, 2018

I tweeted:

I'm sure that past-me had a good reason for configuring my Wireguard tunnel to only start during boot after the VMWare modules had been loaded. I just wish he'd written it down for present-me.

Systemd units are really easy to write, straightforward to read, and quite easy to hack on and modify. But, just like everything else in system administration, they aren't really self documenting. Systemd units will generally tell you clearly what they're doing, but they won't (and can't) tell you why you set them up that way, and one of the places where this can be very acute is in what their dependencies are. Sometimes those dependencies are entirely obvious, and sometimes they are sort of obvious and also sort of obviously superstitious. But sometimes, as in this case, they are outright mysterious, and then your future self (if no one else) is going to have a problem.

(Systemd dependencies are often superstitious because systemd still generally still lacks clear documentation for standard dependencies and 'depend on this if you want to be started only when <X> is ready'. Admittedly, some of this is because the systemd people disagree with everyone else about how to handle certain sorts of issues, like services that want to activate only when networking is nicely set up and the machine has all its configured static IP addresses or has acquired its IP address via DHCP.)

Dependencies are also dangerous for this because it is so easy to add another one. If you're in a hurry and you're slapping dependencies on in an attempt to get something to work right, this means that adding a comment to explain yourself adds proportionally much more work than it would if you already had to do a fair bit of work to add the dependency itself. Since it's so much extra work, it's that much more tempting to not write a comment explaining it, especially if you're in a hurry or can talk yourself into believing that it's obvious (or both). I'm going to have to be on the watch for this, and in general I should take more care to document my systemd dependency additions and other modifications in the future.

(This is one of the thing that version controlled configuration files are good for. Sooner or later you'll have to write a commit message for your change, and when you do hopefully you'll get pushed to explain it.)

As for this particular case, I believe that what happened is that I added the VMWare dependency back when I was having mysteries Wireguard issues on boot because, it eventually turned out, I had forgotten to set some important .service options. When I was working on the issue, one of my theories was that Wireguard was setting up its networking, then VMWare's own networking stuff was starting up and taking Wireguard's interface down because the VMWare code didn't recognize this 'wireguard' type interface. So I set a dependency so that Wireguard would start after VMWare, then when I found the real problem I never went back to remove the spurious dependency.

(I uncovered this issue today as part of trying to make my machine boot faster, which is partially achieved now.)


Comments on this page:

By Anon at 2018-10-14 03:39:29:

Admittedly, some of this is because the systemd people disagree with everyone else about how to handle certain sorts of issues, like services that want to activate only when networking is nicely set up and the machine has all its configured static IP addresses or has acquired its IP address via DHCP.

Can you elaborate on this? I thought systemd was nuanced around this - https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ (i.e. if you care then you have explicitly ask for the network-online target but if possible it is better to make your program cope with the network appearing at some undefined later stage by being dynamic to networks coming/going/being reconfigured)...

By cks at 2018-10-14 16:17:39:

The systemd people make it very plain on that page that they don't really want you to use network-online.target and they refuse to guarantee anything about what it does or should mean (they specifically say that it completely depends on whatever the network management software thinks it should, which means that you can't generically make a unit depend on it and know for sure what you're getting). They also don't have a target defined for 'DNS resolution is up', which is another thing that real things actively care about.

The systemd view is that everyone should modify all of their server software to cope with starting when you have none of IP addresses, network devices, or DNS resolution, and then notice when they start appearing and you can actually do stuff. I have opinions on this theoretical approach and you can probably guess what they are.

(In a System V init world, things are simple. You establish a rule that the network is not up before priority A, up by priority B, and DNS resolution is up by priority C. You then slot your own init services in at appropriate spots, and DNS server software knows to put itself in just before C to preserve that guarantee.)

Written on 10 October 2018.
« Something systemd is missing for diagnosing oddly slow boots
Some notes on Prometheus's Blackbox exporter »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Oct 10 01:37:27 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.