Sorting out my systemd mistake with a script-based service unit
Back in November I wrote about a systemd mistake I made with a script-based service unit, where I left out some service options and got a surprise when my service didn't work. A commentator recently made me realize that I didn't really understand what was going on and what had happened; instead I was working by superstition. So I've now done some experiments and read the systemd.service manpage again, and here's what I know.
The basic situation was that I wrote a .service
file that had
just this, where ExecStart
and ExecStop
are scripts that just
run briefly and then exit:
[Service] WorkingDirectory=/var/local/wireguard ExecStart=/var/local/wireguard/startup ExecStop=/var/local/wireguard/stop Environment=LANG=C
(In this situation, systemd's defaults are that there is an implicit
Type=simple
and the default RemainAfterExit=no
.)
If you don't have RemainAfterExit
and your ExecStart
exits with
status 0, your service becomes inactive (as opposed to failing to
start). If you have an ExecStop
, systemd will then run it, even
though you haven't explicitly asked for a 'stop' operation; in my
situation this mysteriously reversed the effects of my start script.
That this happens is unfortunately not clearly documented anywhere
that I could see, although it makes a certain amount of sense if
you consider ExecStop
to be for cleanup actions, where you often
want the cleanup actions to happen if the service started successfully
and then stops, regardless of just why the service stopped.
(Looking through the stock Fedora 27 systemd .service
units,
quite a lot of the ExecStop
actions appear to be this sort of
cleanup, not 'signal the service to shut down' actions.)
It's easy to see this with a test service that just runs some
scripts. You'll get output from 'systemctl status yourtest.service
'
that looks like this:
Active: inactive (dead) since Mon 2018-04-02 15:37:19 EDT; 41s ago Process: 973 ExecStop=/root/stop-script (code=exited, status=0/SUCCESS) Process: 964 ExecStart=/root/start-script (code=exited, status=0/SUCCESS) Main PID: 964 (code=exited, status=0/SUCCESS)
The ExecStart
script ran, was considered the main PID, exited
with status 0, and then shortly afterward the ExecStop
script was
run (since it has a PID only a bit higher, and the start script ran
a couple of commands).
Contrary to what I thought in my first entry, the Type=oneshot
doesn't affect
this as such. As my commentator noted, what Type=oneshot
instead
of Type=simple
really affects is when other units will get started.
If you have test.service
with the implicit Type=simple
, and
another service says 'After=test.service
', your other service
will get started the moment that systemd has started running
test.service
's ExecStart
. This is often not what you want;
instead you want things that depend on test.service
to only start
when its ExecStart
has finished preparing things and exited.
That's what Type=oneshot
enforces, by making it so that test.service
is only considered 'started' when your ExecStart
program or script
exits. Systemd does more or less document this, at the end of the
description of Type=
:
If set to
simple
[...], as systemd will immediately proceed starting follow-up units.[...]
Behavior of
oneshot
is similar tosimple
; however, it is expected that the process has to exit before systemd starts follow-up units. [...]
(This is not particularly clearly written, unfortunately. Energetic people can propose a documentation patch in the master repo.)
As the documentation notes, Type=oneshot
probably mostly requires
that you use RemainAfterExit=yes
, because otherwise the service
won't be considered to be active. Certainly things will be much
less confusing if you use it, because then all of the units involved
will stay 'active' and you won't ever have the experience of wondering
why something is up and running despite a dependency having failed.
(After=
doesn't actually create a dependency, of course, just an
ordering. But that's another entry.)
|
|