Garbage collection and modern virtual memory systems

March 24, 2012

Every so often I run across a piece of writing that makes me slap my forehead because it points out something obvious that I'd still totally missed. Today's is Pat Shaughnessy's discussion of Ruby 2.0's new garbage collection system, in which he casually explains the problem with traditional non-compacting mark and sweep garbage collection in a modern Unix environment. As usual, I am now going to rephrase the general issue to make sure I have it straight in my head.

Mark and sweep GC works in two phases. In the first phase, it traverses the tree of live objects and marks all of them as still alive with a mark bit; in the second phase, it sweeps all of memory to free all unmarked objects and resets all of the marks. The problem is that a traditional implementation puts the mark bit in the object itself, which means that you write to each live object during a garbage collection pass. This is a great way to dirty a lot of virtual memory pages (and cache lines). As Shaughnessy notes, this is especially unfortunate in any copy on write environment (such as a fork()-based server on Unix), because dirtying pages just to set and unset the mark bits also forces them to be copied (and unshares them).

(How bad this is depends partly on how densely packed your objects are in memory, which has a lot of variables.)

The fundamental problem is that simple mark and sweep dirties otherwise read-only objects, and it does so in scattered locations across memory. In the old days of simple memory systems (both physical and virtual), this was not such a big deal. In a modern memory system this can matter quite a bit.

(Note that a compacting mark and sweep GC has even worse problems from this perspective. Not only does it mark live objects, but it's probably going to move them around in memory and thus dirty even more virtual memory pages.)

While it's tempting to praise reference counting GC for avoiding this, it's not quite accurate; reference counting only avoids similar issues for live objects that don't change their usage count. Many otherwise read-only objects will have their usage count fluctuate while your code is running, which means writes to the object to track it, which means all of these bad things with copy on write and so on.

(In other words, Python people should not feel smug here.)

In a way this whole issue should not be surprising. We've known for a while that a lot of classic memory allocation related algorithms were not well suited for modern memory systems; for example, a while back there was a bunch of work on virtual memory friendly memory allocators (ones that looked at and dirtied as few pages as possible during operation). That simple classical garbage collection algorithms also have these issues is not particularly novel or startling.

As a pragmatic matter, this suggests that you should normally force a garbage collection pass immediately before any fork() that will not shortly be followed by an exec(). If you're running in an environment where the fork() happens outside of your control, force a GC pass after you've initialized your system and are ready to run.


Comments on this page:

From 87.194.56.231 at 2012-03-24 07:08:01:

There's a good talk from PyCon 2012 by Brandon Rhodes explaining how memory works on modern operating systems: http://pyvideo.org/video/717/python-linkers-and-virtual-memory

At the end he compares PyPy and CPython and notes that CPython's reference counting dirties everything, with the consequence that two processes running the same program cannot share their pages.

David B.

There's nothing about the mark-and-sweep algorithm that forces you to put the mark bit in the object being marked. You always have the option of putting the mark bit somewhere else. In the "bitmap marking" technique, the allocator gets hold of contiguous chunks of address space from the operating system, and for each chunk it reserves a portion (1/65th in a 64-bit system) for a table of mark bits. We use this technique for the mark-sweep pools in the Memory Pool System. Alternatively, there may be a hash table or search tree that finds the bit table corresponding to an address. This is used by the Boehm–Demers–Weiser collector.

Written on 24 March 2012.
« Sometimes you get lucky
Link: Getting Real About Distributed System Reliability »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Sat Mar 24 01:16:46 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.