Minimizing object churn to optimize Python code

November 5, 2005

One of the interesting things about writing code in high level garbage collected languages is how your optimization concerns can change. One optimization in all GC'd languages, Python included, is minimizing the creation of new objects and maximizing the reuse of old ones; the fewer new objects you create, the fewer cycles garbage collection eats, the less memory gets fragmented, and you often even use less memory overall.

Minimizing object churn is behind several well-known Python (anti-)patterns, such as 'avoid repeated string concatenation to build a string'.

(The simple approach to accumulating one big string from a bunch of function calls is to concatenate each value to the building result string as you get it. But this churns objects (and copies data) at every concatenation; the right pattern is to put all the function results in a list and then use "".join(reslst) at the end.)

A lot of object churn avoidance can be summed up as leave objects alone if at all possible. This is especially important for large immutable objects like strings; any time you fiddle with a string you wind up with a new object. This means that you want to push fiddling with objects as close to their creation as possible, because this maximizes the chance for sharing the same object.

When you write code, look at it with an eye to how many objects it churns through, and how necessary those are. Some techniques include:

  • if you always modify constants in some way, pre-compute the modified version and store it too.
  • batch up things and do them all at once ("".join() is an example of this).
  • use things like list comprehensions and string .join(); being built in to the interpreter, they can optimize in ways not possible for your Python code.
  • it's better to return an existing constant string than the 'same' string (same value) that you parsed from another string. (This generalizes to all objects.)
  • if you have to lowercase strings that are usually lower-cased (or upper-cased, or trimmed of surrounding whitespace, and so on) to start with, it make be worthwhile to do something like 'nstr = ostr.lower(); if nstr == ostr: nstr = ostr'. This doesn't reduce object churn much, but it will save memory and it minimizes fragmentation by destroying any unnecessary nstr's almost immediately. (This too can be generalized beyond strings.)
  • consider caches; one of their side effects is reducing object churn by returning references to existing objects. Of course, the devil is in the details, including cache aging.

Sidebar: saving string memory with intern

Python lets you 'intern' strings via the built-in intern() function, which maps duplicate strings into the same object. This makes it the easy way to eliminate memory wasted due to duplicate strings in your program, at the cost of a hash lookup in intern() for every string that you want to de-duplicate. (Really, the documentation explains this better than I do.)

You really want to be using Python 2.3 or later before you use this much, because before Python 2.3, any interned strings were immortal. (The interning code held a reference to them so they never got garbage collected.)


Comments on this page:

From 71.141.238.136 at 2005-11-05 13:41:01:

Is memory churn the only reason for avoiding repeated concatenation? It's also a well-known anti-pattern in many other environements because it's O(N^2) instead of O(N). (The two are, in fact, related: repeated concatenation generates N intermediate objects, and copying the data between the intermediate objects takes time. Also, I'm playing fast and loose with 'N' here, in a way normally used for lists, not strings.)

The only way to avoid the excess copying and memory usage while still using repeated concatenation in a GC environment is to use a system which (a) reference counts and (b) checks that the concatenate operation is being stored into the sole pointer to the LHS. I'm not sure what systems exist that fit (a)--Perl, I think?, and possibly a few other scripting languages--and I don't know of any that do (b).

The reason I think of this as transcending the GC issue is because it's even an anti-pattern in C; building a long string via repeated strcat() is equally inefficient, even if done in place, because C strings lack length information (c.f. your post about Bstrings).

From 70.17.58.246 at 2005-11-06 02:48:17:

First off, many GC systems reference count, in addition to whatever other tricks may be about. (Java's doesn't, but many other do, including, I believe, the current python garbage collector) There's nothing that prevents a mark-and-sweep GC system from also using reference counting as a supplement. You're correct that perl's current GC is purely refcount-based.

Secondly, I believe that perl does in fact do the optimization you mention, in terms of checking that the RHS and the LHS of an operation are the same spot, and if so, not creating a temporary object. In fact, I believe that CPython as of 2.4 does this as well; see optimizations.

As for concatenating a list of size M to a list of size N being a O(N) operation (making the concatenation of N lists of approximately the same size a O(N^2) operation), that's a consequence of a singly-linked-list approach to lists, which is very uncommon outside functional languages (e.g. Haskell, lisp) these days. With doubly-linked lists, a single concatenation is a O(1) operation, assuming that you don't need to copy either list off into a temporary variable because something is still referencing the un-linked-together portion. (c.f. object churn) I'm not sure what the big-O order is for ArrayList-like implementations, but there you get into the problem I often have with most big-O analyses of micro-benchmarks like list concatenation or sorting algorithms, in that they tend to focus on cpu arithmetic ops and stores or reads from memory which, while occasionally significant, are often dwarfed by memory re-allocation. (ArrayList performance is going to be determined by the nature and frequency of the re-allocations of the array to bigger sizes)

Strings are in most languages (python, perl, Java, etc.) represented by ArrayList-like lists of characters.

Please don't interpret this as advocating for other list implementations in functional languages; singly linked lists have other advantages that it's not worth giving up for the sake of faster concatenation. For example, singly linked lists can share structure if you have two lists with a common tail. However, if you're using singly linked lists in C or java, that's just silly.

From 71.141.238.136 at 2005-11-06 11:58:07:

I guess I played too fast and loose with O(N) to be clear.

Concatenating N strings each of M characters requires O(NM) time, and concatenating N lists of 1 item requires O(N^2) time, if these operations are done by repeated concatenation. Remember, that's the case you were concerned about due to excess garbage (the N-1 intermediate entities); the point here is that building each of those entities takes time, wasted time, in the old 1+2+3+4+...+N pattern.

It is interesting that Perl and Python do make that particular optimization. Of course, that means that those cases aren't problematic for generating excessive temporaries, as well.

--nothings

From 70.17.58.246 at 2005-11-06 22:05:09:

Again, asserting that building a list by repeated concatenation is a O(N^2) operation depends on the list being implemented via singly linked lists. This is not true in python. Python lists are not implemented in terms of singly linked lists, and it is not in Python a O(N) operation to add something to the end of a list, but rather O(1). (Remember that python lists are explicitly mutable, so that concatenation doesn't need to copy the whole pre-existing list in order to add 1 element - see APythonCodingMistake)

In a way, it's unfortunate that Haskell lists and python lists are both called "lists" - they really are different data structures which, although they share a common interface, have very different performance characteristics.

As for concatenating N character strings requiring time on the order of the size of the result, this applies only in the best case scenario, where that optimization you mentioned in your first post is followed so that you don't need to re-allocate space, and where you don't need to travel to the end of the string first a character at a time, as you do when using strcat in C.

Of course, in python you can't depend on that optimization being around (it was introduced only in the C implementation of python 2.4, and even Debian's unstable branch isn't using 2.4 as the default Python yet).

All this is to say that in Python it is reasonable to use a bunch of list appends followed by a ''.join(list) because in Python extending a list by one element is a O(1) or at worst O(log N) operation, and not a O(N) operation. (The O(log N) is my best guess at the performance of ArrayList-like list implementations) Furthermore, the reason you don't want to do it by simple string concatenation (pre-2.4) is related to frequent object allocation more than to copying the data from one object to another.

The reasons that appending a new element to an existing size N list in other languages is O(N) is not, I think, related.

-- DanielMartin, who keeps forgetting to sign his posts

From 71.141.238.136 at 2005-11-07 18:29:45:

"Remember that python lists are explicitly mutable"

I couldn't remember that, since I never knew it--I'm not a python programmer. As you say, bad python for calling them that. But ok, I see your point then.

But yeah, I was primarily commenting on strings-in-python (since that was the specific example), and just the trend in general.

--nothings

From 71.141.238.136 at 2005-11-07 18:33:02:

"(The O(log N) is my best guess at the performance of ArrayList-like list implementations.)"

I'm not sure what an ArrayList is, but, as cks mentions, normal malloc/realloc() style allocation is designed to use memory doubling specifically so that reallocating an object to size N only does at most O(N) copying, regardless of the number and size of the intervening objects.

N-sized list via repeated concatenation: O(N^2), N-1 temps N-sized array via repeated extension: O(N), 0 temps

--nothings

Written on 05 November 2005.
« Modifying Linux kernel configuration options the easy way
Weekly spam summary on November 5th, 2005 »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Nov 5 03:17:32 2005
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.