Wandering Thoughts archives

2005-09-16

Getting a list of all objects in Python

One of the most 'interesting' issues in most garbage collected languages is memory usage analysis: figuring out why your program is using so much memory, and where. Often this winds up enmeshed in tricky issues of object lifetime, retained references, and so on.

One of the first steps in this sort of work is simply figuring out what live objects you have in memory and what they are. Jonathan Ellis recently ran into this issue and wound up asking an interesting question: how do you get a list of all live objects in a Python program?

Fortunately for my ability to look clever on short notice, I already wrestled with this very question while developing a long running network daemon (our SMTP and NNTP frontend). (Memory usage analysis in Python is a big subject; I hope to write about other facets of it in later entries.)

Part of Python's good introspection support is the gc module, which pokes into the internal garbage collector. gc.get_objects() looks like just what we want, but unfortunately (as Jonathan Ellis found out) it doesn't return a complete object list. Particularly, it seems to skip objects that don't contain other objects (and not all container objects are on it, either).

(Important update: I was wrong about several things to do with gc.get_objects, including it not including all container objects. See GetAllObjectsII for the corrections and qualifications to the above.)

To get a full list, you need to recursively expand the initial gc.get_objects() list, while keeping track of what objects you've already expanded to avoid duplicating things referred to from multiple locations and circular reference loops. To save you the effort of writing this code, here's my version:

import gc
# Recursively expand slist's objects
# into olist, using seen to track
# already processed objects.
def _getr(slist, olist, seen):
  for e in slist:
    if id(e) in seen:
      continue
    seen[id(e)] = None
    olist.append(e)
    tl = gc.get_referents(e)
    if tl:
      _getr(tl, olist, seen)

# The public function.
def get_all_objects():
  """Return a list of all live Python
  objects, not including the list itself."""
  gcl = gc.get_objects()
  olist = []
  seen = {}
  # Just in case:
  seen[id(gcl)] = None
  seen[id(olist)] = None
  seen[id(seen)] = None
  # _getr does the real work.
  _getr(gcl, olist, seen)
  return olist

(Disclaimer: this code is not going to be completely accurate in a threaded program unless you figure out how to stop all the other threads from modifying container objects while it runs. Even then it'll just be a snapshot of one moment when it started.)

python/GetAllObjects written at 23:56:31;

Web browsers make bad text editors

As editors web browsers have all sorts of problems, such as a narrow view of the text (squeezed into a small to tiny text box), only (very) basic editing operations, and a severe lack of features. Some of the features can be somewhat fixed up by the website, like spellchecking and saving drafts in progress, but even then they tend to be awkward. (You can read one person's grumbles about the problem in the context of blogging here.)

This makes it puzzling that more and more people are designing systems that call for web browsers to fill the role of text editors. Often the web browsers are the only available text editors. 'Web-based' is big (blogs, wikis, bug tracking systems, and so on) and all too often web-based means 'only accessible through the web'.

Every time I see this, I wince.

Bad software creates a kind of friction. In the face of friction, people have to work harder and be more motivated in order to use your software. Some of them won't bother; some of them will wind up grumpy. The more friction your systems have, the larger this effect.

Most people have a finite amount of energy and time that they're willing to devote to writing things. The more work the text editing takes, the less they have left to spend on creating and refining the actual content. And the content is the important thing, so effort spent merely editing it is basically lost.

(Certainly this is the case for me. More than once I've concluded that fighting text editing in my browser simply would take more energy than I have available for writing at the moment, and not written comments on this or that.)

In many cases the people writing the content are probably the most important users of your system (especially in the case of bug tracking system). The corollaries are obvious.

DWiki has deliberately adopted the contrarian position of making file editing with a real editor the primary (and so far only) way to work on pages. I feel strongly that this is a big part of why I've been willing to keep writing at least an entry a day for almost three months now; it is simply that much easier.

(The extra features enabled by real file editing are very nice, too, like drafts and notes and outlines that stick around as long as I want them, and an ideas file.)

(Updated: my apologies to people who are seeing this twice. I realized I had given this entry the wrong name, and in DWiki changing an entry's name also changes its identity in syndication feeds.)

web/BrowsersMakeBadEditors written at 01:20:57;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.