The history of sending signals to Unix process groups

September 5, 2022

All (Unix) processes are members of some process group. Process groups go very far back in Unix; they're present at least as far back as Fourth Edition (V4) Unix. However, they aren't really "process groups" in the modern sense, as we can see from the relevant proc struct field being called p_ttyp. Instead they were used primarily to send signals to your terminal processes when various things happened (see dmr/tty.c and dmr/dc.c), and the 'process group number' was the address of the 'struct tty' for your terminal.

In V7, h/proc.h changed the p_ttyp field to p_pgrp and now called it the 'process group leader'. However, there's (still) no way to send a signal to a process group from user code, although various tools know about the idea of process groups and will report them to user level (for example pstat.1m, which gets this information in the traditional Unix approach of reading kernel memory, per cmd/pstat.c). V7 is also where the 'process group' number becomes the process ID of the first process to open a (serial) tty after it's been closed.

(The V6 ps is aware of p_ttyp and uses it to report the controlling terminal, but I don't think it prints it. In any case the specific value of the 'process group' in V6 isn't very meaningful, since it's still the address of a kernel structure instead of the PID of the process group leader.)

The inability to send signals to process groups changed, apparently independently, in System III and 4BSD. In System III, kill(2) documents the modern approach of sending a signal to the a process group by using a negative 'PID' in the kill(2) system call. System III also has an explicit getpgrp(2) system call and supports setpgrp(2). According to intro.2, System III claims to differentiate between the 'process group' and the 'tty group'; however, proc.h only has the V7 p_pgrp, and the code to do things like handle control-C (in tt0.c) uses p_pgrp (via signal() in sig.c). I don't know enough to say why System III decided to let process groups change and be exposed explicitly.

In 4BSD the reason for a change is much simpler, because 4BSD introduced job control. Job control intrinsically involves multiple process groups, which requires exposing them to user level code and providing user level code ways to send signals to entire process groups. As I mentioned in yesterday's entry, 4BSD implements the ability to signal process groups in a different way from System III. Although 4BSD has a separate killpg(2j) function that calls itself a system call, the actual implementation uses the kill(2) system call with the signal number negated instead of the process ID (see the code for kill() in sys4.c, and also killpg.s). By 4.1c BSD there's an actual killpg() system call, although kern_sig.c calls it temporary. Only in 4.3 BSD does the behavior of negative PIDs appear in kill(2), and even then kill.2 says that it's for compatibility with System V. 4.3 BSD is also where the kill() system call stops supporting the 4BSD behavior of sending signals to process groups instead of PIDs through negative signal numbers (see kern_sig.c).

Before I started down this rabbit hole I would have assumed that you could send signals to process groups as far back as at least V7, and that it would have been done in the modern way. I wouldn't have guessed that signaling process groups was developed separately in both main branches of Unix (AT&T and BSD), and that they initially used different APIs.

Since I just looked it up, POSIX standardized both killpg() and the modern version of kill(). You can of course implement your killpg() through a POSIX standard kill(), so you don't need both as actual system calls.

Written on 05 September 2022.
« Support for 'kill -SIGNAME ...' was added in 4BSD
Machine room temperatures and the value of long Prometheus metrics history »

Page tools: View Source.
Search:
Login: Password:

Last modified: Mon Sep 5 22:41:31 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.