2021-05-17
Unix job control has some dark corners and challenging cases, illustrated
Recently I learned that the common Linux version of vipw
does
some odd things with your terminal's foreground process group,
one of the cores of Unix job control, due to
a change made in late 2019
(via).
It does this to fix a problem that's described in the issue and sort of in
the commit,
but neither the issue nor the commit discuss the larger context of
why all of this is needed. Also, the fix has a small omission which can
cause problems in some circumstances. So how did we get here?
Vipw's job is to let you safely edit /etc/passwd
(or with the
right option /etc/shadow
), but it's not an editor itself, it's a
front-end on some arbitrary editor of your preference. It works by
setting things up and then running the editor of your choice. This
creates three different potential interactions with job control
and the terminal.
First, infrequently vipw will be running an editor that doesn't
take over the terminal or otherwise do anything with job control
(perhaps someone likes ed
). When you type Ctrl-Z, the terminal
sends the entire foreground process group SIGTSTP
and everything
stops. Second, ideally vipw will be running an editor that takes
over the terminal and handles Ctrl-Z itself, but the editor carefully
sends SIGSTOP
(or SIGTSTP
) to the entire foreground process
group. Third, it may be running an editor that takes over the
terminal, handles Ctrl-Z itself, but only SIGSTOP
s the editor
itself, not the entire process group.
(A version of the third case is that someone manually sends
SIGSTOP
to the editor.)
This third case is the initial problem. If vipw does nothing about it and you type Ctrl-Z to such an editor, your vipw session will appear to hang until you type Ctrl-Z a second time. The editor has stopped itself, but vipw is only waiting for it to finish and exit, so from the shell's perspective it seems that vipw is still the active foreground program. Your second Ctrl-Z will suspend vipw, now that the editor is no longer trapping Ctrl-Z.
That it's so easy for editors to do the wrong thing here is the first of our dark corners of job control. An editor that does this wrong (or 'wrong') will work almost all of the time, because people are usually not running it with a front-end like vipw or from inside a script (another situation where suspending only yourself is the wrong answer).
To handle this third case, vipw needs to listen for the editor being suspended and then suspend itself too, passing the suspension through to the shell. But now the first and the second cases are a problem. When either the TTY or the editor suspends the entire process group, a notification that the editor process has been suspended may be queued up for vipw. Vipw can't tell this apart from the third case, so when you un-suspend vipw (and the editor) again it will see that the editor was suspended and immediately re-suspend everything. This is the issue report.
The fix vipw made (and in my opinion the correct one) was to put the editor in a new process group and make this new process group the terminal's foreground process group (this is how process groups interact with job control). With the editor isolated in its own process group, this essentially reduces everything to the third case. Vipw will be unaffected whether the TTY suspends the process group, the editor suspends the process group, or the editor just suspends itself, and in all cases vipw will notice that the editor has been suspended and suspend itself, turning control back over to the shell.
However, this change introduced a new bug, which is that when the editor finally finishes and everything is exiting, vipw doesn't change the terminal's foreground process group back to what it used to be before vipw itself exits. That this bug is almost invisible (and thus easy to introduce without noticing) is the second dark corner of job control.
The reason the bug is almost invisible is that almost all shells today are job control shells and a job control shell is basically indifferent to what you leave the foreground process group set to. Because a job control shell always changes the foreground process group away from itself when it starts or resumes a job, it always has to set it back to itself when it regains control after a program exits. This constant shuffling of foreground process groups is intrinsic to how job control works.
(Even the Almquist shell,
used as a relatively minimal sh
on FreeBSD and some Linuxes, has
job control. OpenBSD's sh
is ksh,
and it too has job control.)
PS: In self defense, my non-job-control shell (also) has evolved to generally reset the foreground process group to itself when it seems necessary; in fact it started doing this in 2000 (really). However, the code path for readline support didn't do this until I stumbled over this vipw issue. GNU Readline itself will clean up a number of things about the terminal it's attempting to deal with, but not this (which is sensible).