Putting cron jobs into systemd user slices doesn't always work (on Ubuntu 16.04)
As part of dealing with our Ubuntu 16.04 shutdown problem, we now have our systems set up to put all user cron jobs into systemd user slices so that systemd will terminate them before it starts unmounting NFS filesystems. Since we made this change, we've rebooted all of our systems and thus had an opportunity to see how it works in practice in our environment.
Unfortunately, what we've discovered is that pam_systemd
apparently doesn't always work right. Specifically, we've seen some
user cron @reboot
entries create processes that wound up still
under cron.service
, although other @reboot
entries for the same
user on the same machine wound up with their processes in user
slices. When things fail, pam_systemd doesn't log any sort of
errors that I can see in the systemd journal.
(Since no failures are logged, this doesn't seem related to the famous systemd issue where pam_systemd can't talk to systemd, eg systemd issue 2863 or this Ubuntu issue.)
The pam_systemd source code
isn't very long and doesn't do very much itself. The most important
function here appears to be pam_sm_open_session
, and reading
the code I can't spot a failure path that doesn't cause pam_systemd
to log an error. The good news is that turning on debugging for
pam_systemd doesn't appear to result in an overwhelming volume
of extra messages, so we can probably do this on the machines where
we've seen the problem in the hopes that something useful shows up.
(It will probably take a while, since we don't reboot these machines very often. I have not seen or reproduced this on test machines, at least so far.)
Looking through what 'systemctl list-dependencies
' with various
options says for cron.service, it's possible that we need an explicit
dependency on systemd-logind.service
(although systemd-analyze
on one system says that systemd-logind started well before crond).
In theory it looks like pam_systemd should be reporting errors
if systemd-logind hasn't started, but in practice, who knows. We
might as well adopt a cargo cult 'better safe than sorry' approach
to unit dependencies, even if it feels like a very long shot.
(Life would be simpler if systemd had a simple way of discovering the relationship, if any, between two units.)
|
|