A systemd mistake with a script-based service unit I recently made
That sure was a bunch of debugging because I forgot that my systemd .service file that runs scripts needed
(... or it'd apparently run the ExecStop script right after the ExecStart script, which doesn't work too well.)
Let's be specific here. This was the systemd
.service unit to
bring up my WireGuard tunnel on my work
machine, which I set up to run a 'startup' script (via
Because I had a 'stop' script sitting around, I also set the unit's
ExecStop= to point to that; the 'stop' script takes the device
down and so on.
The startup script worked when I ran it by hand, but when I set up
.service unit to start WireGuard on boot, it didn't. Specifically,
journalctl reported no errors, the WireGuard tunnel
network device and its associated routes just weren't there when
the system finished booting. At first I thought the script was
failing in a way that the systemd journal wasn't capturing, so I
stuck a bunch of debugging in (capturing all output from the script
in a file, and then running with '
set -x', and finally dumping
out various pieces of network state after the script had finished).
All of this debugging convinced me that the WireGuard tunnel was
being created during boot but then getting destroyed by the time
booting finished. I flailed around for a while theorizing that this
service or that service was destroying the WireGuard device when
it was starting (and altering my
.service to start after a steadily
increasing number of other things), but nothing fixed the issue.
Then, while I was starting at my
.service file, the penny dropped
and I actually read what was in front of my eyes:
[Service] WorkingDirectory=/var/local/wireguard ExecStart=/var/local/wireguard/startup ExecStop=/var/local/wireguard/stop Environment=LANG=C
.service file had started out life as one that I'd copied
.service file of mine. However, that
was for a daemon, where the
ExecStart= was a process that was
sticking around. I was running a script, and the script was exiting,
which meant that as far as systemd was concerned the service was
going down and it should immediately run the
ExecStop script. My
'stop' script deleted the WireGuard tunnel network device, which
explained why I found the device missing after booting had finished.
journalctl output won't tell you this; it reports only that
the service started and not mention that it's stopped again and
ExecStop script was run. If I'd looked at '
status ...' and paid attention, I'd at least have had a clue because
systemd would have told me that it thought that the service was
inactive (dead)' instead of running. If I'd had both scripts
explicitly log that they were running, I would have seen in the
logs that my 'stop' script was being executed for some reason; I
probably should add this.
This has been a pretty useful learning experience. I know, that probably sounds weird, but my view is that I'd rather make these mistakes and learn these lessons in a non-urgent, non-production situation instead of stubbing my toes on them in production and possibly under stressful conditions.