Some hints on debugging memory leaks in Python programsPrograms written in garbage collected languages like Python aren't immune to memory leaks (except in a picky technical sense), just vastly less prone to them. Unfortunately, this rarity can leave you struggling with the problem should it come up. In languages like C, memory leaks generally happen because you've forgotten about a piece of dynamically allocated memory. In Python, it's the opposite problem: you get memory leaks when you don't forget about objects, when your program holds references to objects that stop them from being garbage collected. One easy way to see if you have a real Python-level memory leak is to use the code from GetAllObjects to count how many objects are in the system. If this keeps growing, you have a problem. (If your program's total memory usage keeps climbing, you also have a problem but it may or may not be due to a memory leak in your code.) Given the nature of memory leaks in GC languages, a good first place to look for retained references is caches (which are there explicitly to remember things). Make sure your caches have aging policies and that they work right, and watch out for caching unexpectedly large objects. (In long running programs, you need to make sure that long lived objects aren't unnecessarily large and don't hold too many references to other objects. This can call for things like slimmed-down variants of objects, or deliberately destroying some of the references on an object when it's going into long-life mode.) Another thing to look at is cyclic data structures or groups of data structures with cross-references; they make it much easier to indirectly keep a large data structure live without noticing. A specific case of this is tree-like data structures where the 'flow' of references is bidirectional; for example, a tree where nodes hold references to parents as well as children. In such cases, a live reference to any node can keep the entire tree alive. There are more obscure ways to hold references alive, including:
Sometimes the memory leaks aren't because you have more objects, but because the size of the objects are growing. One common one is an ever growing string buffer, partly because strings are one of the few variable sized non-container objects. Counting objects won't turn this up; to find it you'll need to check the total length of strings you have. The gc module and the
code from GetAllObjects can be used to browse around your program's
object state to hunt for clues. Obvious starting points are questions
like 'how many objects of class X exist', but you can also do things
like use Additional resources
The Zope project has a TrackRefs class that is part of their test program, but it apparently requires a debug build of the main Python interpreter. If this sounds interesting to you, visit their SVN repository, navigate to Zope/trunk, and pick up test.py. (I'd give a direct URL, but I'm not sure how to give a stable one into a SVN repository.) Sidebar: Python before Python 2.0If you're targeting a version of Python before 2.0, you need to more or less completely avoid circular references. Before 2.0, Python used only reference counting to collect garbage, causing any circular or cyclic references to make all of the objects involved immortal (as their reference counts would never go to zero because of the reference cycle). Sidebar: the other cause of memory usage growthThe other way your program's memory use can keep growing is if your object usage pattern is fragmenting the interpreter's usage of system memory. One discussion of part of this issue is in this blog entry on Python memory management. And all of this assumes that you're not having to deal with a compiled extension module that has memory management problems of its own. Some XML modules are apparently well known to leak memory if not used exactly right, and there's always outright bugs. |
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |