Wandering Thoughts archives


What 'PID rollover' is on Unix systems

On Unix, everything is a process (generally including the threads inside processes, because that makes life simpler), and all processes have a PID (Process ID). In theory, the only special PID is PID 1, which is init, which has various jobs and which often causes your system to reboot if it dies (which isn't required even if most Unixes do it). Some Unixes also have a special 'PID 0', which is a master process in the kernel (on Illumos PID 0 is sched, and on FreeBSD it's called [kernel]). PIDs run from PID 1 upward to some maximum PID value and traditionally they're used strictly sequentially, so PID X is followed by PID X+1 and PID X+2 (even if some of the processes may be very short-lived).

(OpenBSD uses randomized PIDs by default; FreeBSD can turn them on by setting the kern.randompid sysctl, at least according to Internet searches. Normal Linux and Illumos are always sequential.)

Once, a very long time ago, Unix was a small thing and it ran on small, slow machines that liked to use 16-bit integers, ie the DEC PDP-11 series that was the home of Research Unix up through V7. In V7, PIDs were C shorts, which meant that they had a natural maximum value of 32767, and the kernel further constrained their maximum value to be 29,999. What happened when you hit that point? Well, let's just quote from newproc() in slp.c:

    * First, just locate a slot for a process
    * and copy the useful info from this process into it.
    * The panic "cannot happen" because fork has already
    * checked for the existence of a slot.
    if(mpid >= 30000) {
           mpid = 0;
           goto retry;

(The V7 kernel had a lot of gotos.)

This is PID rollover, or rather the code for it.

The magical mpid is a kernel global variable that holds the last PID that was used. When it hits 30,000, it rolls back over to 0, gets incremented to be 1, and then we'll find that PID 1 is in use already and try again (there's another loop for that). Since V7 ran on small systems, there was no chance that you could have 30,000 processes in existence at once; in fact the kernel had a much smaller hardcoded limit called NPROC, which was usually 150 (see param.h).

Ever since V7, most Unix systems have kept the core of this behavior. PIDs have a maximum value, often still 30,000 or so by default, and when your sequential PID reaches that point you go back to starting from 1 or a low number again. This reset is what we mean by PID rollover; like an odometer rolling over, the next PID rolls over from a high value to a low value.

(I believe that it's common for modern Unixes to reset PIDs to something above 1, so that the very low numbered PIDs can't be reused even if there's no process there any more. On Linux, this low point is a hardcoded value of 300.)

Since Unix is no longer running on hardware where you really want to use 16-bit integers, we could have a much larger maximum PID value if we wanted to. In fact I believe that all current Unixes use a C type for PIDs that's at least 32 bits, and perhaps 64 (both in the kernel and in user space). Sticking to signed 32 bit integers but using the full 2^31-1 integer range would give us enough PIDs that it would take more than 12 years of using a new PID every 500 microseconds before we had a PID rollover. However, Unixes are startlingly conservative so no one goes this high by default, although people have tinkered with the specific numbers.

(FreeBSD PIDs are officially 0 to 99999, per intro(2). For other Unixes, see this SE question and its answers.)

To be fair, one reason to keep PIDs small is that it makes output that includes PIDs shorter and more readable (and it makes it easier to tell PIDs apart). This is both command output, for things like ps and top, and also your logs when they include PIDs (such as syslog). Very few systems can have enough active or zombie processes that they'll have 30,000 or more PIDs in use at the same time, and for the rest of us, having a low maximum PID makes life slightly more friendly. Of course, we don't have to have PID rollover to have low maximum PIDs; we can just have PID randomization. But in theory PID rollover is just as good and it's what Unix has always done (for a certain value of 'Unix' and 'always', given OpenBSD and so on).

In the grand Unix tradition, people say that PID rollover doesn't have issues, it just exposes issues in other code that isn't fully correct. Such code includes anything that uses daemon PID files, code that assumes that PID numbers will always be ascending or that if process B is a descendant of process A, it will have a higher PID, and code that is vulnerable if you can successfully predict the PID of a to-be-created process and grab some resource with that number in it. Concerns like these are at least part of why OpenBSD likes PID randomization.

(See this interesting stackexchange answer about how Unixes behave and when they introduced randomization options.)

PidRollover written at 23:51:18; Add Comment


The history of terminating the X server with Ctrl + Alt + Backspace

If your Unix machine is suitably configured, hitting Ctrl + Alt + Backspace will immediately terminate the X server, or more accurately will cause the X server to immediately exit. This is an orderly exit from the server's perspective (it will do things like clean up the graphics state), but an abrupt one for clients; the server just closes their connections out of the blue. It turns out that the history of this feature is a bit more complicated than I thought.

Once upon a time, way back when, there was the X releases from the (MIT) X Consortium. These releases came with a canonical X server, with support for various Unix workstation hardware. For a long time, the only way to get this server to terminate abruptly was to send it a SIGINT or SIGQUIT signal. In X11R4, which I believe was released in 1989, IBM added a feature to the server drivers for their hardware (and thus to the X server that would run on their AIX workstations); if you hit Control, Alt, and Backspace, the server would act as if it had received a SIGINT signal and immediately exit.

(HP Apollo workstations also would immediately exit the X server if you hit the 'Abort/Exit' key that they had on their custom keyboard, but I consider this a different sort of thing since it's a dedicated key.)

In X11R5, released in 1991, two things happened. First, IBM actually documented this key sequence in server/ddx/ibm/README (previously it was only mentioned in the server's IBM-specific usage messages). Second, X386 was included in the release, and its X server hardware support also contained a Ctrl + Alt + Backspace 'terminate the server' feature. This feature was carried on into XFree86 and thus the version of the X server that everyone ran on Linux and the *BSDs. The X386 manpage documents it this way:

Immediately kills the server -- no questions asked. (Can be disabled by specifying "dontzap" in the configuration file.)

I never used IBM workstations, so my first encounter with this was with X on either BSDi or Linux. I absorbed it as a PC X thing, one that was periodically handy for various reasons (for instance, if my session got into a weird state and I just wanted to yank the rug out from underneath it and start again).

For a long time, XFree86/Xorg defaulted to having this feature on. Various people thought that this was a bad idea, since it gives people an obscure gun to blow their foot off with, and eventually these people persuaded the Xorg people to change the default. In X11R7.5, released in October of 2009, Xorg changed things around so that C-A-B would default to off in a slightly tricky way and that you would normally use an XKB option to control this; see also the Xorg manpage.

(You can set this option by hand with setxkbmap, or your system may have an xorg.conf.d snippet that sets this up automatically. Note that running setxkbmap by hand normally merges your changes with the system settings; see its manpage.)

Sidebar: My understanding of how C-A-B works today

In the original X386 implementation (and the IBM one), the handling of C-A-B was directly hard-coded in the low level keyboard handling. If the code saw Backspace while Ctrl and Alt were down, it called the generic server code's GiveUp() function (which was also connected to SIGINT and SIGQUIT) and that was that.

In modern Xorg X with XKB, there's a level of indirection involved. The server has an abstracted Terminate_Server event (let's call it that) that triggers the X server exiting, and in order to use it you need to map some actual key combination to generate this event. The most convenient way to do this is through setxkbmap, provided that all you want is the Ctrl + Alt + Backspace combination, but apparently you can do this with xmodmap too and you'll probably have to do that if you want to invoke it through some other key combination.

The DontZap server setting still exists and still defaults to on, but what it controls today is whether or not the server will pay attention to a Terminate_Server event if you generate one. This is potentially useful if you want to not just disable C-A-B by default but also prevent people from enabling it at all.

I can see why the Xorg people did it this way and why it makes sense, but it does create extra intricacy.

XBackspaceTerminateHistory written at 23:52:57; Add Comment


Default X resources are host specific (which I forgot today)

I've been using X for a very long time now, which means that over the years I've built up a large and ornate set of X resources, a fair number of which are now obsolete. I recently totally overhauled how my X session loads my X resources, and in the process I went through all of them to cut out things that I didn't need any more. I did this first on my home machine, then copied much of the work over to my office machine; my settings are almost identical on the two machines anyway, and I didn't feel like doing all of the painstaking reform pass a second time.

Xterm has many ornate pieces, one of which is an elaborate system for customizing character classes for double click word selection. The default xterm behavior for this is finely tuned for use on Unix machines; for example, it considers each portion of a path to be a separate word, letting you easily select out one part of it. Way back in Fedora Core 4, the people packaging xterm decided to change this behavior to one that is allegedly more useful in double-clicking on a URL. I found this infuriating and promptly changed it back by setting the XTerm*charClass X resource to xterm's normal default, and all was good. Or rather I changed it on my work machine only, because Fedora apparently rethought the change so rapidly that I never needed to set the charClass resource on my home machine.

(My old grumblings suggest that perhaps I skipped Fedora Core 4 entirely on my own machines, and only had to deal with it on other machines that I ran X programs on. This is foreshadowing.)

When I was reforming my X resources, I noticed this difference between home and work and as a consequence dropped the charClass setting on my work machine because clearly it wasn't necessary any more. Neatening up my X resources and deleting obsolete things was the whole point, after all. Then today I started a remote xterm on an Ubuntu 16.04 machine, typed a command line involving a full path, wanted to select a filename out of the path, and my double-click selected the whole thing. That's when I was pointedly reminded that default X resources for X programs are host specific. Your explicitly set X resources are held by the server, so all programs on all hosts will see them and use them, but if you don't set some resource the program will go look in its resource file on the host it's running on. There is no guarantee that the XTerm resource file on Fedora is the same as the XTerm resource file on Ubuntu, and indeed they're not.

(It turns out that way back in 2006, Debian added a patch to do this to their packaging of xterm 208. They and Ubuntu seem to have been faithfully carrying it forward ever since. You can find it mentioned in things like this version of the Debian package changelog.)

In summary, just because some X resource settings are unnecessary on one machine doesn't mean that they're unnecessary on all machines. If they're necessary anywhere, you need to set them in as X resources even if it's redundant for most machines. You may even want to explicitly set some X resources as a precaution; if you really care about some default behavior happening, explicitly setting the resource is a guard against someone (like Debian) getting cute and surprising you someday.

(There's another use of setting X resources to the default values that the program would use anyway, but it's slightly tricky and perhaps not a good idea, so it's for another entry.)

The reason I never had this problem at home, despite not setting the XTerm*charClass resource, is that I almost never use remote X programs at home, especially not xterm. Instead I start a local xterm and run ssh in it, because in practice that's faster and often more reliable (or at least it was). If I run a remote xterm on an Ubuntu machine from home, I have the problem there too, and so I should probably set XTerm*charClass at home just in case.

PS: To add to the fun of checking this stuff, different systems keep the default X resource files in different places. On Fedora you find them in /usr/share/X11/app-defaults, on Ubuntu and probably Debian they're in /etc/X11/app-defaults, and on FreeBSD you want to look in at least /usr/local/lib/X11/app-defaults.

(On OmniOS and other Illumos based systems it's going to depend on where you installed xterm from, since it's not part of the base OS and there are multiple additional package sources and even package systems that all put things in different places. I recommend using find, which is honestly how I found out most of these hiding places even on Linux and FreeBSD.)

XResourcesPerHost written at 01:17:44; Add Comment


The history of Unix's confusing set of low-level ways to allocate memory

Once upon a time, the Unix memory map of a process was a very simple thing. You had text, which was the code of the program (and later read-only data), the stack, which generally started at the top of memory and grew down, initialized data right after the text, and then bss (for variables and so on that started out as zero). At the very top of the bss, ie the highest address in the process's data segment, was what was called the program break (or, early on, just the break). The space between the program break and the bottom of the stack was unused and not actually in your process's address space, or rather it started out as unused. If you wanted to get more free memory that your program could use, you asked the operating system to raise this point, with what were even in V7 two system calls: brk() and sbrk(). This is directly described in the description of brk():

char *brk(addr) [...]

Brk sets the system's idea of the lowest location not used by the program (called the break) to addr (rounded up to the next multiple of 64 bytes on the PDP11, 256 bytes on the Interdata 8/32, 512 bytes on the VAX-11/780). Locations not less than addr and below the stack pointer are not in the address space and will thus cause a memory violation if accessed.

Unix programs used brk() and sbrk() to create the heap, which is used for dynamic memory allocations via things like malloc(). The heap in classical Unix was simply the space you'd added between the top of the bss and the current program break. Usually you didn't call brk() yourself but instead left it to the C library's memory allocation functions to manage for you.

(There were exceptions, including the Bourne shell's very creative approach to memory management.)

All of this maintained Unix's simple linear model of memory, even as Unix moved to the fully page-based virtual memory of the DEC Vax. When functions like malloc() ran out of things on their free list of available space, they'd increase the break, growing the process's memory space up, and use the new memory as more space. If you free()d the right things to create a block of unused space at the top of the break, malloc() and company might eventually call brk() or sbrk() to shrink the program's break and give the memory back to the OS, but you probably didn't want to count on that.

This linear memory simplicity had its downsides. For example, fragmentation was a real worry and unless you felt like wasting memory it was difficult to have different 'arenas' for different sorts of memory allocation. And, as noted, Unix programs rarely shrank the amount of virtual memory that they used, which used to matter a lot.

Then, in SunOS 4, Unix got mmap(), which lets people add (and remove) pages of virtual memory anywhere in the process's memory space, not just right above the program's break (or just below the bottom of the stack). This includes anonymous mappings, which are just pages of memory exactly like the pages of memory that you add to the heap by calling sbrk(). It didn't take the people writing implementations of malloc() very long to realize that they could take advantage of this in various ways; for example, they could mmap() several different chunks of address space and use them for arenas, or they could directly allocate sufficiently large objects by direct mmap() (and then directly free them back to the operating system by dropping the mappings). Pretty soon people were using mmap() not just to map files into memory but also to allocate general dynamic memory (which was still called the 'heap', even if it was no longer continuous and linear).

Over time, there's been a tendency for more and more memory allocation libraries and systems to get most or all of their memory from Unix through mmap(), not by manipulating the old-school heap by using sbrk() to change the program break. Often using mmap() only is simpler, and it's also easier to coexist with other memory allocation systems because you're not all fighting over the program break; each mmap() allocation can be manipulated separately by different pieces of code, and all you have to do is worry about not running out of address space (which is generally not a worry on modern 64-bit systems).

(For example, the Go runtime allocates all of its memory through mmap().)

Today, it's generally safe to assume that the memory for almost any large single memory allocation will be obtained from the Unix kernel by mmap(), not by growing the classical heap through sbrk(). In some C libraries and some environments, smaller memory allocations may still come from the classical heap; if you're curious, you can tell by pointing a system call tracer at a program to see if it even calls sbrk() or brk(). How frequently used brk() is probably depends on the Unix (and on Linux, on the C library). I know that GNU libc does use brk() based allocation for programs that only make small allocations, for example /usr/bin/echo.

(Using the classical heap through brk() has some minor advantages, including that it usually doesn't create an additional kernel virtual memory area and those usually have some costs associated with them.)

The current state of low-level Unix memory allocation has thus wound up being somewhat confusing, but that's Unix for you; our current situation is the result of a complicated historical evolution that has been surprisingly focused on backward compatibility. I don't think anyone has ever seriously proposed throwing out brk() entirely, although several of the BSDs call it a historical curiosity or a legacy interface (OpenBSD, FreeBSD), and I suspect that their C libraries never use them for memory allocation.

(This entry was sparked by reading Povilas Versockas' Go Memory Management.)

SbrkVersusMmap written at 01:15:20; Add Comment

By day for June 2018: 7 15 17 29; before June; after June.

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.