An old Unix trick for saving databases

October 19, 2012

Suppose that you have a program with a significant in-memory database, or just in-memory state of some sort. You need to save or checkpoint the database every so often, but you have a big enough database that the program will pause visibly while you write everything out. In the modern era, people might reach for threads and start spraying locking over their in-memory data structures and so on. But in the old days you didn't have threads so you couldn't do it the hard way; instead, you had to do it the easy way.

The traditional easy Unix way of handling this is to just fork(). The parent process is the main process; it can continue on as normal, serving people and answering connections and so on. The child process takes as long as it needs to in order to write out the database and then quietly exits. The parent and child don't have to do any new locking of the database, since they each have their own logical copy.

(Because the database-saving child has to be deliberately created by the main process, the main process can generally guarantee that the database is in a consistent state before it fork()'s.)

This approach has the great advantage that it's generally dirt simple to implement (and then relatively bombproof in operation). You probably have a routine to save the database when the program shuts down, so a basic version is to fork, call this routine, and then exit. However it has at least two disadvantages, one semi-recent and one that's always been there right from the start.

The semi-modern problem is that fork() generally doesn't play well with things like threading and various other forms of asynchronous activity. This wasn't a problem in the era that this trick dates from, because those things didn't yet really exist, but it may complicate trying to add this trick to a modern program. The always-present problem is that doing this with a big in-memory database and thus a big process has always given the virtual memory accounting system a little heart attack because you are, in theory, doubling the memory usage of a large program. As a result it's one of the hardest uses of fork() for strict overcommit, and it's not amenable to going away with an API change.

(After all, snapshotting the memory is the entire point of this trick.)

Comments on this page:

From at 2012-10-19 09:59:01:

Nowadays an MMU is standard and fork() implementations do CoW more or less by definition, so it seems to me the only way for this to actually double the amount of committed memory very quickly would be if the database data set had so much churn that the pre- and post-fork copies divert faster than the snapshot process and store it and exit. (In that case this trick essentially amounts to making the “copy the database quickly” requirement into the virtual memory manager’s problem.)

But if you had a database with such an enormous rate of churn, surely it is not going to be useful to snapshot it in the first place?

So is the VMM heart attack a problem that can actually arise in real-world relevant work loads?

Aristotle Pagaltzis

From at 2012-10-19 10:00:35:

s/snapshot process and store it/snapshot process can store it/

Aristotle Pagaltzis

By cks at 2012-10-19 10:43:12:

The problem is the distinction between the amount of memory used and the amount of memory that the operating system is committed to supplying if it's asked for. In this case generally very little extra memory is actually used (only the memory for the data changed in the in-memory database before the write-out finishes and the child exits), but the fork() immediately commits the kernel to an entire extra copy of what's usually a fairly large amount of virtual memory. Strict overcommit thus causes artificial failures here.

My vague and fading memory is that this did wind up sometimes causing problems back in the days when the trick was common, because they were also the days of very simple methods of allocating swap space.

From at 2012-10-23 03:47:21:

"(After all, snapshotting the memory is the entire point of this trick.)"

One of my debugging tricks for daemons is

  fork() || abort();

being added at "interesting" points in the code. You do loose the context of other threads but you have a nice clean core file with almost no distrption to the service. - Icarus

Written on 19 October 2012.
« A danger of default values for function arguments (in illustrated form)
The issue with measuring disk performance through streaming IO »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Oct 19 01:40:08 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.