2005-10-03
Some important notes on getting all objects in Python
It turns out that I'm wrong about several things I mentioned in GetAllObjects, although the code there is still useful and as correct as you can reasonably get. However, it does have a few limitations and may miss objects under some circumstances.
First, gc.get_objects
actually returns all container objects.
In specific, it returns all objects that can participate in reference
cycles; this necessarily includes all container objects (dicts, tuples,
and lists), but also include other types as well. (My code that seemed
to say otherwise was in error; I didn't do a proper breadth-first
traversal of the list.)
Second, it's possible that expanding gc.get_objects
may not get
all objects. The main way this can happen is that gc.get_objects
can't see objects that are only referred to from C code, for example if
a compiled extension module is holding on to an object for later use
without creating a visible name binding. (One example of this is the
signal module, which
holds an internal reference to any function set as a signal handler.)
If you need a completely accurate count, you need to use a debug build
of Python. This keeps an internal list of all live dynamically allocated
Python objects and makes it available via some additional functions in
the sys
module. (Naturally this slows the interpreter down and makes
it use more memory.)
Even this has an omission: it lists only 'heap' objects, those that
have been dynamically allocated. Python has a certain number of 'static'
objects, such as type objects in the C code (instead of being created,
their names just get registered with the Python interpreter). There
are also static plain objects, for example True
, False
, and None
.
However, many of these static objects will appear on the expanded
gc.get_objects
list. This is because they are referred to by live
objects and gc.get_referents
is happy to include them in its
results. (This may not be too useful for object usage counting, since
you can't get rid of static objects anyways.)
I owe a debt of thanks to Martin v. Löwis, who graciously took the time to correct my misconceptions and errors, and explain things to me. (Any remaining errors are of course my fault.)
(The charm of blogging is that I get to make mistakes like this in public. On the upside, I now know a bunch more about the insides of the CPython implementation than I used to.)