2017-09-29
Shell builtin versions of standard commands have drawbacks
I'll start with a specific illustration of the general problem:
bash# kill -SIGRTMIN+22 1 bash: kill: SIGRTMIN+22: invalid signal specification bash# /bin/kill -SIGRTMIN+22 1 bash#
The first thing is that yes, this is Linux being a bit unusual.
Linux has significantly extended the usual range of Unix signal
numbers to include POSIX.1-2001 realtime signals,
and then can vary what SIGRTMIN
is depending on
how a system is set up. Once Linux had these extra signals (and
defined in the way they are), people sensibly added support for
them to versions of kill
. All of this is perfectly in accord
with the broad Unix philosophy; of course if you add a new facility
to the system you want to expose it to shell scripts when that's
possible.
Then along came Bash. Bash is cross-Unix, and it has a builtin
kill
command, and for whatever reason the Bash people didn't
modify Bash so that on Linux it would support the SIGRTMIN+<n>
syntax (some possible reasons for that are contained in this
sentence). The results of that are a divergence between the
behavior of Bash's kill
builtin and the real kill
program
that have become increasingly relevant now that programs like
systemd
are taking advantage of the extra signals to allow you to control
more of their operations by sending them more signals.
Of course, this is a generic problem with shell builtins that
shadow real
programs in any (and all) shells; it's not particularly specific
to Bash (zsh also has this issue on Linux, for example). There are
advantages to having builtins, including builtins of things like
kill
, but there are also drawbacks. How best to fix or work around
them isn't clear.
(kill
is often a builtin in shells with job control, Bash included,
so that you can do 'kill %<n>
' and the like. Things like test
are often made builtins for shell script speed, although Unixes
can take that too far.)
PS: certainly one answer is 'have Bash implement the union of all
special kill
, test
, and so on features from all Unixes it runs
on', but I'm not sure that's going to work in practice. And Bash
is just one of several popular shells, all of whom would need to
keep up with things (or at least people probably want them to do
so).
More on systemd on Ubuntu 16.04 failing to reliably reboot some of our servers
I wrote about how Ubuntu 16.04 can't reliably reboot some of our servers, then discovered that systemd can shut down the network with NFS mounts still present and speculated this was (and is) one of our problems. I've now been able to reliably produce such a reboot failure on a test VM and narrow down the specific component involved.
Systemd shuts down your system in two stages;
the main stage that stops systemd units, and the final stage, done
with systemd-shutdown
,
which kills the remaining processes, fiddles around with the remaining mounts,
and theoretically eventually reboots the system. In the Ubuntu 16.04 version
of systemd-shutdown
, part of what it tries to do with NFS filesystems is
to remount them read-only, and for us this sometimes hangs. With suitable
logging enabled in systemd
so that
systemd-shutdown
is run with it, we see:
Sending SIGTERM to remaining processes... Sending SIGKILL to remaining processes... Sending SIGKILL to PID <nnn> (<command>) Unmounting file systems. Remounting '/var/mail' read-only with options '<many of them>'.
At this point things hang, although if you have it set up a shutdown
watchdog will force a reboot and recover
the system. Based on comments on my second entry, systemd-shutdown
doing this is (now)
seen as a problem and it's been changed in the upstream version of
systemd, although only very recently (eg this commit
only landed at the end of August).
Unfortunately this doesn't seem to be the sole cause of our shutdown
hangs. We appear to have had at least one reboot hang while systemd
attempts to swapoff
the server's swap space, before it enters
late-stage reboot. This particular server has a lot of inactive
user processes because it hosts our user-managed web servers, and (at the time) they weren't
being killed early in system shutdown, so
turning off swap space presumably had to page a lot of things back
into RAM. This may not have actually hung as such, but if so it was
sufficiently slow as to be unacceptable and we force-rebooted the
server in question after a minute or two.
We're currently using multiple ways to hopefully reduce the chances
of hangs at reboot times. We've put all user cron jobs into systemd
user slices so that systemd will kill
them early, although this doesn't always work
and we may need some clever way of dealing with the remaining
processes. We've enabled a shutdown watchdog timer with a relatively short timeout, although
this only helps if the system makes it to the second stage when it
runs systemd-shutdown
; a 'hang' before then in swapoff
won't
be interrupted.
In the future we may enable a relatively short JobTimeoutSec
on
reboot.target
, in the hopes that this does some good. I've
considered changing Ubuntu's cron.service
to KillMode=control-group
and then holding the package to prevent surprise carnage during
package upgrades, but this seems to be a little bit too much hassle
and danger for an infrequent thing that is generally merely irritating.
As a practical matter, this entry is probably the end of the saga.
This is not a particularly important thing for us and I've already
discovered that there are no simple, straightforward, bug-free fixes
(and as usual the odds are basically zero that
Ubuntu will fix bugs here). If we're lucky, Ubuntu 18.04 will include
a version of systemd with the systemd-shutdown
NFS mount fixes
in it and perhaps pam_systemd will be more reliable for @reboot
cron jobs. If we're not lucky, well, we'll keep having to trek down
to the machine room when we reboot servers. Fortunately it's not
something we do very often.