2013-12-11
The problem with nondeterministic garbage collection
Yesterday I mentioned in passing that I think that nondeterministic garbage collection is a significant mistake. Today it's time to expand on that, and the first step is defining my terms so that people can understand me. By nondeterministic garbage collection I mean GC that only collects garbage objects at unpredictable amounts of time after they become unused. This is in contrast to deterministic prompt garbage collectors that collect straightforward garbage objects immediately or almost immediately after they become unused.
(I believe that prompt GC is almost always based on reference counting.)
The problem with nondeterministic GC can be illustrated in two Python examples. First, the version with prompt GC:
data = open("/some/file", "r").read()
Then a correctly written version in the face of nondeterministic GC:
fp = open("/some/file", "r") data = fp.read() fp.close() del fp # to clean up buffers
In short, the problem with nondeterministic garbage collection is that it forces you to do manual storage management. You can't rely on garbage collection if you care about memory usage or if object lifetime has side effects (such as keeping files open), because GC may be arbitrarily delayed; instead you must explicitly do cleanup and try to destroy objects. Instead of becoming a great simplification, GC turns into something that handles only trivial objects (or what you hope is trivial objects) and objects with complex lifetimes.
Actually it's even worse than I've shown here. In a nondeterministic GC
environment there is absolutely no guarantee that my 'del fp
' does
anything to clean up fp
on the spot. It may well not. If it doesn't
then there is nothing I can do to control memory usage by promptly
reclaiming now-unused large objects and so on short of forcing garbage
collection passes (if I can). The best I can do is try to eliminate any
state associated with fp
by explicitly calling fp.close()
.
(The implementation of file objects can't help me out because it too
doesn't have any magic way of destroying any internal buffers fp
is
maintaining. It can make them unused, but that doesn't GC them any more
than my 'del fp
' did.)
Manual storage management and object lifetime management sucks. It's what garbage collection is supposed to get us away from. Moving back to it is not progress for any language that is supposed to be biased towards convenience.
(I believe that people like nondeterministic GC because reference counting GC has performance issues with updating reference counts all the time, especially in threaded environments.)
I'm sure this observation is not new to me, and in fact I may have read a version of it in my random walk through the multi-faceted Internet.