My interesting experience with rapid repeated PID rollover on Linux
In the normal course of events, PID rollover is one of those things that I know about but that I don't really expect to ever actively observe, especially on a workstation style machine. Linux may roll over PIDs after reaching PID 32767 (by default), but to to do this in even four days requires a consistent process creation and recycling rate of over five a minute for those four days, and my machines aren't that busy, at least not unless there's something very unusual going on. So I was kind of surprised when one day I noticed in passing that my office machine had reached PID 4,097,894 despite it only having been up for a relatively short time. I felt that clearly something was wrong, probably in the kernel, and so at first my suspicions fell on ZFS on Linux kernel threads, because that's what stood out. Since I use the ZoL development git tip, it also matched my initial belief that the issue had only appeared recently. Both parts of this turned out to be wrong.
To cut a long story short, it turned out that building Go from
source routinely churns through 200,000 or more PIDs (mostly from
running its self-tests). Go's build
process has probably been doing this for some time, but I didn't
notice until a package set the
kernel.pid_max sysctl and so preserved the high PIDs that resulted
from me routinely rebuilding Go. Before
kernel.pid_max was raised
from Linux's default of 32K, the Go build process had been silently
causing PID rollovers at a rate of about once a minute during tests.
I also do full builds of Firefox from source every so often, and
while they only churn through about 20,000 PIDs that's still enough
to probably provoke a PID rollover on a reasonably common basis.
What's striking to me about this experience is that I didn't notice anything. Until I saw the chronyd log entry with its startlingly high PID, I had no idea this was happening, and it had been happening for months and months on a routine basis. I didn't notice during the PID rollovers and I didn't notice afterward, which means that nothing broke in any visible way from rapid, frequent PID rollovers. As someone who's been using Unix for some time, this is both pleasant and a bit startling. It's certainly how things are supposed to work, but it's not necessarily been how they actually did in practice. I guess programs have just evolved to the point where they're basically fine with both PID rollover and use of high PIDs.
Sidebar: What's behind this PID rollover
The difference in PID churn between Firefox's build process and Go's build and test process is instructive. Firefox churns through 20,000 PIDs, apparently mostly by running tons of compiles; at a rough ballpark estimate it's using slightly under one PID per CPU per second over its entire build process (although some of it is irritatingly serialized). This is more or less the traditional highly parallel build experience.
Go builds about three to four faster and doesn't seem to be starting lots of processes during it, despite churning through ten times as many PIDs. Instead, it appears to use a great deal of internal parallelism during both compilation and especially testing. Go's test phase not only saturates your CPUs, it also appears to go through (real) threads at a relatively ferocious rate. Since creating and destroying threads within a process is much faster than starting new processes, Go can churn through PIDs here at a rate that building Firefox can't even come close to.
(This is a little bit surprising to me. Go code makes heavy use of goroutines, but they're normally multiplexed onto a much smaller number of real threads and my impression was that real threads are generally preserved, not aggressively created and discarded. It's possible that various Go tests cause large sudden surges in the need for real threads and then after such a test finishes the Go runtime decides it has too many threads for its current needs (with a new set of less demanding concurrent tests) and throws most of them away.)