Getting a list of all objects in Python

September 16, 2005

One of the most 'interesting' issues in most garbage collected languages is memory usage analysis: figuring out why your program is using so much memory, and where. Often this winds up enmeshed in tricky issues of object lifetime, retained references, and so on.

One of the first steps in this sort of work is simply figuring out what live objects you have in memory and what they are. Jonathan Ellis recently ran into this issue and wound up asking an interesting question: how do you get a list of all live objects in a Python program?

Fortunately for my ability to look clever on short notice, I already wrestled with this very question while developing a long running network daemon (our SMTP and NNTP frontend). (Memory usage analysis in Python is a big subject; I hope to write about other facets of it in later entries.)

Part of Python's good introspection support is the gc module, which pokes into the internal garbage collector. gc.get_objects() looks like just what we want, but unfortunately (as Jonathan Ellis found out) it doesn't return a complete object list. Particularly, it seems to skip objects that don't contain other objects (and not all container objects are on it, either).

(Important update: I was wrong about several things to do with gc.get_objects, including it not including all container objects. See GetAllObjectsII for the corrections and qualifications to the above.)

To get a full list, you need to recursively expand the initial gc.get_objects() list, while keeping track of what objects you've already expanded to avoid duplicating things referred to from multiple locations and circular reference loops. To save you the effort of writing this code, here's my version:

import gc
# Recursively expand slist's objects
# into olist, using seen to track
# already processed objects.
def _getr(slist, olist, seen):
  for e in slist:
    if id(e) in seen:
      continue
    seen[id(e)] = None
    olist.append(e)
    tl = gc.get_referents(e)
    if tl:
      _getr(tl, olist, seen)

# The public function.
def get_all_objects():
  """Return a list of all live Python
  objects, not including the list itself."""
  gcl = gc.get_objects()
  olist = []
  seen = {}
  # Just in case:
  seen[id(gcl)] = None
  seen[id(olist)] = None
  seen[id(seen)] = None
  # _getr does the real work.
  _getr(gcl, olist, seen)
  return olist

(Disclaimer: this code is not going to be completely accurate in a threaded program unless you figure out how to stop all the other threads from modifying container objects while it runs. Even then it'll just be a snapshot of one moment when it started.)

Written on 16 September 2005.
« Web browsers make bad text editors
Demon Internet joins the webmail hall of shame »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Sep 16 23:56:31 2005
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.