The problem with nondeterministic garbage collection

December 11, 2013

Yesterday I mentioned in passing that I think that nondeterministic garbage collection is a significant mistake. Today it's time to expand on that, and the first step is defining my terms so that people can understand me. By nondeterministic garbage collection I mean GC that only collects garbage objects at unpredictable amounts of time after they become unused. This is in contrast to deterministic prompt garbage collectors that collect straightforward garbage objects immediately or almost immediately after they become unused.

(I believe that prompt GC is almost always based on reference counting.)

The problem with nondeterministic GC can be illustrated in two Python examples. First, the version with prompt GC:

data = open("/some/file", "r").read()

Then a correctly written version in the face of nondeterministic GC:

fp = open("/some/file", "r")
data = fp.read()
fp.close()
del fp    # to clean up buffers

In short, the problem with nondeterministic garbage collection is that it forces you to do manual storage management. You can't rely on garbage collection if you care about memory usage or if object lifetime has side effects (such as keeping files open), because GC may be arbitrarily delayed; instead you must explicitly do cleanup and try to destroy objects. Instead of becoming a great simplification, GC turns into something that handles only trivial objects (or what you hope is trivial objects) and objects with complex lifetimes.

Actually it's even worse than I've shown here. In a nondeterministic GC environment there is absolutely no guarantee that my 'del fp' does anything to clean up fp on the spot. It may well not. If it doesn't then there is nothing I can do to control memory usage by promptly reclaiming now-unused large objects and so on short of forcing garbage collection passes (if I can). The best I can do is try to eliminate any state associated with fp by explicitly calling fp.close().

(The implementation of file objects can't help me out because it too doesn't have any magic way of destroying any internal buffers fp is maintaining. It can make them unused, but that doesn't GC them any more than my 'del fp' did.)

Manual storage management and object lifetime management sucks. It's what garbage collection is supposed to get us away from. Moving back to it is not progress for any language that is supposed to be biased towards convenience.

(I believe that people like nondeterministic GC because reference counting GC has performance issues with updating reference counts all the time, especially in threaded environments.)

I'm sure this observation is not new to me, and in fact I may have read a version of it in my random walk through the multi-faceted Internet.

Written on 11 December 2013.
« My current view of PyPy
Some observations from playing with PyPy on DWiki »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Dec 11 01:52:22 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.