Garbage collection and modern virtual memory systems
Every so often I run across a piece of writing that makes me slap my forehead because it points out something obvious that I'd still totally missed. Today's is Pat Shaughnessy's discussion of Ruby 2.0's new garbage collection system, in which he casually explains the problem with traditional non-compacting mark and sweep garbage collection in a modern Unix environment. As usual, I am now going to rephrase the general issue to make sure I have it straight in my head.
Mark and sweep GC works in two phases. In the first phase, it traverses
the tree of live objects and marks all of them as still alive with a
mark bit; in the second phase, it sweeps all of memory to free all
unmarked objects and resets all of the marks. The problem is that a
traditional implementation puts the mark bit in the object itself, which
means that you write to each live object during a garbage collection
pass. This is a great way to dirty a lot of virtual memory pages (and
cache lines). As Shaughnessy notes, this is especially unfortunate in
any copy on write environment (such as a
fork()-based server on Unix),
because dirtying pages just to set and unset the mark bits also forces
them to be copied (and unshares them).
(How bad this is depends partly on how densely packed your objects are in memory, which has a lot of variables.)
The fundamental problem is that simple mark and sweep dirties otherwise read-only objects, and it does so in scattered locations across memory. In the old days of simple memory systems (both physical and virtual), this was not such a big deal. In a modern memory system this can matter quite a bit.
(Note that a compacting mark and sweep GC has even worse problems from this perspective. Not only does it mark live objects, but it's probably going to move them around in memory and thus dirty even more virtual memory pages.)
While it's tempting to praise reference counting GC for avoiding this, it's not quite accurate; reference counting only avoids similar issues for live objects that don't change their usage count. Many otherwise read-only objects will have their usage count fluctuate while your code is running, which means writes to the object to track it, which means all of these bad things with copy on write and so on.
(In other words, Python people should not feel smug here.)
In a way this whole issue should not be surprising. We've known for a while that a lot of classic memory allocation related algorithms were not well suited for modern memory systems; for example, a while back there was a bunch of work on virtual memory friendly memory allocators (ones that looked at and dirtied as few pages as possible during operation). That simple classical garbage collection algorithms also have these issues is not particularly novel or startling.
As a pragmatic matter, this suggests that you should normally force a
garbage collection pass immediately before any
fork() that will not
shortly be followed by an
exec(). If you're running in an environment
fork() happens outside of your control, force a GC pass
after you've initialized your system and are ready to run.