One way of capturing debugging state information in a systemd-based system
Suppose, not entirely hypothetically, that you have a systemd
.service unit running something where the something (whatever it
is) is mysteriously failing to start or run properly. In the most
frustrating version of this, you can run the operation just fine
after the system finishes booting and you can log in, but it fails
during boot and you can't see why. In this situation you often want
to gather information about the boot-time state of the system just
before your daemon or program is started and fails; you might need
to know things like what devices are available, the state of network
interfaces and routes, what filesystems have been mounted, what
other things are already running, and so on.
All of this information can be gathered by a shell script, but the
slightly tricky bit is figuring out how to get it to run. I've taken
two approaches here. The first one is to simply write a new
[Unit] Description=Debug stuff After=<whatever> Before=<whatever else> [Service] Type=oneshot RemainAfterExit=True ExecStart=/root/gather-info [Install] WantedBy=multi-user.target
Here the actual information gathering script is
I typically have it write its data into a file in
/root as well.
/root as a handy dumping ground that's on the root filesystem
but not conceptually owned by the package manager in the way that
/bin, and so on are; I can throw things in there without
worrying that I'm causing (much) future problems.
(If you use an
ExecStop= instead of
ExecStart= you can gather
the same sort of information at shutdown.)
However, if you're interested in the state basically right before
.service runs, the better approach is to modify that
.service to add an extra
ExecStartPre= line. In order to make
sure I know what's going on, my approach is to copy the entire
.service file to
/etc/systemd/system (if necessary) and then
edit it. As an example, suppose that your ZFS on Linux setup is
failing to import pools on boot because the
unit is failing.
Here I'd modify the
.service like this:
ExecStart=/sbin/zpool import -c /etc/zfs/zpool.cache -aN
Unfortunately I don't think you can do this without copying the
.service file, or at least I wouldn't want to trust it
any other way.
Possibly there's a better way to do this in the systemd world, but
I've been sort of frustrated by how difficult it is to do various
things here. For example, it would be nice if systemd would easily
give you the names of systemd units that ran or failed, instead of
Description= texts. More than once I've had to resort to
grep -rl <whatever> /usr/lib/systemd/system' in an attempt to
find a unit file so I could see what it actually did.
Sidebar: My usual general format for information-gathering scripts
I tend to write them like this:
#!/bin/sh ( date; [... various commands ...] echo ) >>/root/somefile.txt
The things I've found important are the date stamp at the start, that I'm appending to the file instead of overwriting it, and the blank line at the end for some more visual separation. Appending instead of overwriting can really save things if for some reason I have to reboot twice instead of once, because it means information from the first reboot is still there.