Wandering Thoughts archives

2012-10-21

Why fork() is a good API

Back in this entry, I claimed that fork() is a good API. This may strike some people as a crazy thing to say, given that many people consider fork() to be undesirable and something to be papered over with higher-level APIs like posix_spawn().

The first thing to understand about fork() is that like a number of other Unix APIs, it is primarily a kernel API, the interface between the kernel and user level code, not necessarily an interface that ordinary programmers writing ordinary code are expected to use directly. Unix has a number of these, most famously the split between the kernel read() and write() APIs and standard IO.

A kernel API has different demands and tensions than a user-facing API. User-facing APIs are generally created to be convenient and easy to use, and sometimes to be hard to get things wrong (shielding you from common sorts of errors). Kernel APIs worry about efficiency, minimalism, and flexibility. They are much more willing to push work up to user-level code, and indeed they often prefer to do this.

From this perspective, fork() is a great API with two great virtues, one of them obvious and one of them more subtle. The first virtue is that fork() is about the most minimal process creation primitive you could imagine. We can see this by how few arguments it takes; even revised and generalized to clone(), it takes only a couple of arguments. If the kernel directly supported an operation like, say, posix_spawn(), the system call involved would need to take much more complex arguments and do much more complex things, things that can perfectly well be delegated to user level code (library or otherwise). The other way to view fork()'s minimalism is that fork() does only the bits that require kernel privileges. By splitting the 'spawn' operation into two system calls, Unix could let user-level code handle any number of things in the middle that don't need kernel privileges (some of them done with the help of existing system calls).

(In the process Unix enabled significant flexibility and power for programs that wanted to do things other than 'spawn a separate command'.)

The subtle virtue is how future-proof fork() is because of how little it does. Unix has steadily acquired more and more per-process state over time, state that you may well want to change aspects of in a command that you run. If the kernel implemented a 'spawn' system call that combined creating a new process, changing various aspects of its state, and then executing a command, this system call would have needed to have its arguments steadily revised to allow you to change all of these new bits of state. But because changing state is up to user-level code, fork() needs no changes at all to support new state; all you need to do is revise user-level code to do whatever you need. Do you need to change the security label, decrease memory limits, or restrict the processor cores that the command will run on? All of that is up to you. In turn this has made it much easier to add such new state to Unix (and to specific Unixes), because you can just add the state and some system calls to change it; you don't need to make any changes to the arguments a hypothetical 'spawn' system call takes.

The other aspect to this is that the creators of a hypothetical 'spawn' system call haven't had to try to anticipate every change to process state that people could want to do when they spawn commands. This matters because best practices in this area has changed and grown more elaborate over time as people gained more experience with what caused problems on Unix.

(I was going to use the example of file descriptor leaks from parents into commands that they run, which used to be common but has been more or less stamped out today. However this is a bad example for a hypothetical spawn system call because the problem is actually solved in a different way than closing file descriptors after fork(), for good reasons.)

ForkGoodAPI written at 00:39:25; Add Comment

2012-10-19

An old Unix trick for saving databases

Suppose that you have a program with a significant in-memory database, or just in-memory state of some sort. You need to save or checkpoint the database every so often, but you have a big enough database that the program will pause visibly while you write everything out. In the modern era, people might reach for threads and start spraying locking over their in-memory data structures and so on. But in the old days you didn't have threads so you couldn't do it the hard way; instead, you had to do it the easy way.

The traditional easy Unix way of handling this is to just fork(). The parent process is the main process; it can continue on as normal, serving people and answering connections and so on. The child process takes as long as it needs to in order to write out the database and then quietly exits. The parent and child don't have to do any new locking of the database, since they each have their own logical copy.

(Because the database-saving child has to be deliberately created by the main process, the main process can generally guarantee that the database is in a consistent state before it fork()'s.)

This approach has the great advantage that it's generally dirt simple to implement (and then relatively bombproof in operation). You probably have a routine to save the database when the program shuts down, so a basic version is to fork, call this routine, and then exit. However it has at least two disadvantages, one semi-recent and one that's always been there right from the start.

The semi-modern problem is that fork() generally doesn't play well with things like threading and various other forms of asynchronous activity. This wasn't a problem in the era that this trick dates from, because those things didn't yet really exist, but it may complicate trying to add this trick to a modern program. The always-present problem is that doing this with a big in-memory database and thus a big process has always given the virtual memory accounting system a little heart attack because you are, in theory, doubling the memory usage of a large program. As a result it's one of the hardest uses of fork() for strict overcommit, and it's not amenable to going away with an API change.

(After all, snapshotting the memory is the entire point of this trick.)

ForkAndStateDumping written at 01:40:08; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.