A systemd mistake with a script-based service unit I recently made
That sure was a bunch of debugging because I forgot that my systemd .service file that runs scripts needed
Type=oneshot
RemainAfterExit=True(... or it'd apparently run the ExecStop script right after the ExecStart script, which doesn't work too well.)
Let's be specific here. This was the systemd .service
unit to
bring up my WireGuard tunnel on my work
machine, which I set up to run a 'startup' script (via ExecStart=
).
Because I had a 'stop' script sitting around, I also set the unit's
ExecStop=
to point to that; the 'stop' script takes the device
down and so on.
The startup script worked when I ran it by hand, but when I set up
the .service
unit to start WireGuard on boot, it didn't. Specifically,
although journalctl
reported no errors, the WireGuard tunnel
network device and its associated routes just weren't there when
the system finished booting. At first I thought the script was
failing in a way that the systemd journal wasn't capturing, so I
stuck a bunch of debugging in (capturing all output from the script
in a file, and then running with 'set -x
', and finally dumping
out various pieces of network state after the script had finished).
All of this debugging convinced me that the WireGuard tunnel was
being created during boot but then getting destroyed by the time
booting finished. I flailed around for a while theorizing that this
service or that service was destroying the WireGuard device when
it was starting (and altering my .service
to start after a steadily
increasing number of other things), but nothing fixed the issue.
Then, while I was starting at my .service
file, the penny dropped
and I actually read what was in front of my eyes:
[Service] WorkingDirectory=/var/local/wireguard ExecStart=/var/local/wireguard/startup ExecStop=/var/local/wireguard/stop Environment=LANG=C
This .service
file had started out life as one that I'd copied
from another .service
file of mine. However, that .service
file
was for a daemon, where the ExecStart=
was a process that was
sticking around. I was running a script, and the script was exiting,
which meant that as far as systemd was concerned the service was
going down and it should immediately run the ExecStop
script. My
'stop' script deleted the WireGuard tunnel network device, which
explained why I found the device missing after booting had finished.
The journalctl
output won't tell you this; it reports only that
the service started and not mention that it's stopped again and
that the ExecStop
script was run. If I'd looked at 'systemctl
status ...
' and paid attention, I'd at least have had a clue because
systemd would have told me that it thought that the service was
'inactive (dead)
' instead of running. If I'd had both scripts
explicitly log that they were running, I would have seen in the
logs that my 'stop' script was being executed for some reason; I
probably should add this.
This has been a pretty useful learning experience. I know, that probably sounds weird, but my view is that I'd rather make these mistakes and learn these lessons in a non-urgent, non-production situation instead of stubbing my toes on them in production and possibly under stressful conditions.
Comments on this page:
|
|