Wandering Thoughts archives

2009-02-15

How CPython optimizes allocations for some built-in types

Due to writing yesterday's entry, I got curious enough to go see how CPython handles the creation of some of the built-in types, and what sorts of optimizations it does. The necessary disclaimer is that this is for CPython 2.5.1; it doesn't necessarily apply to other versions of CPython, and it probably doesn't apply at all to other Python implementations (although I expect that they have their own tricks). Thus, the best use for this is as an illustration of the sorts of optimizations that can be done and are often done in this area.

First is a somewhat subtle advantage that built-in types have: there are specific bytecode operations for directly creating new tuples, lists, and dicts, so creating any of them doesn't require the regular name dispatch process. (Well, assuming that you use the direct syntax instead of perversely calling 'list()' or the like; if you invoke the type, it is a regular function invocation.)

After that there are a number of type-specific optimizations:

  • tuples have a fixed empty tuple that is returned for all uses of '()', and also have a cache of already-allocated C-level tuple structures for tuples with between 1 and 20 elements; each element count can have up to 2,000 such tuples ready to be used.

    (Since tuples are used heavily inside the interpreter, I assume that this optimization is a useful win.)

  • lists keep a 'recently released' list of up to 80 C-level list structures that can be reused without having to allocate memory. Since empty lists don't require anything besides the basic C-level list structure, this means that repeatedly allocating and releasing an empty list probably won't churn memory.

    You do churn memory if you are allocating and releasing non-empty lists, because the actual list of what's in the list is stored in a separate memory area that is allocated and released every time through.

  • dicts keep a 'recently released' list of up to 80 C-level dict structures. I believe that these are fully ready-to-go empty dictionaries, with everything necessary allocated.

  • plain (byte) strings specifically intern both the empty string and all single-character strings.

  • unicode strings intern the empty string and single-character strings that are in the Latin-1 range (duplicating what plain strings do). They also keep a 'recently released' list of up to 1024 C-level unicode string structures; unlike lists, the memory blob used for the actual string contents is not released if it's 9 Unicode characters or less.

A number of other built-in types have special allocation schemes and special caches, such as integers and floats, but that's beyond the scope of this entry. (Plus, I do not feel like skimming through the C implementation of every built in type.)

BuiltinCreationOptimizations written at 01:56:22; Add Comment

2009-02-14

Some of my assumptions about Python object allocation

In a reddit comment thread I was accurately dinged for being a bit casual in how I talked about object allocation at the end of my previous entry. As the comment notes, calling '.setdefault(k, [])' creates and throws away a new object each time the key already exists, just as '.setdefaults(k, SomeClass())' would.

Well, sort of. One of the things I assume about Python is that allocating and freeing new empty instances of built-in types (such as '[]') is highly optimized and in practice is very cheap, while allocating and freeing instances of Python-level classes is relatively expensive both in memory and in time. I am thus much more casual about churning simple primitive instances than I am about churning user written classes.

(Things like zero-length strings and empty tuples can be optimized even further; since they are unchangeable once created, you can have a single null instance of each and just hand out references to it.)

I think of instances of Python-level classes as expensive to create and destroy for two reasons. First, even a very minimal class instance has a not insignificant amount of overhead; unless it uses __slots__, it has at least a dict instance for its __dict__ and a C-level class instance structure of some sort. Second, if the class has an __init__ method, it must be called, with the overhead in time and object creation and churn that that implies. (And of coure the __init__ can wind up attaching more data on the new instance, making it more expensive.)

ObjectAllocationAssumption written at 00:41:13; Add Comment

2009-02-13

The accumulator mini-pattern and .setdefault()

What I'm going to call the accumulator (mini-)pattern is a common operation when summarizing a stream of data: you have some keys, which can repeat, and you want to accumulate some data each time each key comes up, to count it or sum it all up or keep a list of all of the data for each key.

In pretty much every language that has them, this pattern is done with dictionaries (or hashes or the language's equivalent). In Python, this creates the minor annoyance of initializing a key's entry the first time you see the key, so you wind up with annoying code that looks like this:

store = {}
def accum(k, v):
    if k not in store:
        store[k] = []
    store[k].append(v)

(In awk, one of the Unix progenitors of this pattern, unset elements default to zero so you can usually write just 'store[$1] = store[$1] + $2' or the like.)

There's a number of variations on the same basic idea; I have seen people write 'store[k] = store.get(k, 0) + v', for example. Which one you settle on depends partly on what operation you're doing (a default-value .get() is convenient for math, an 'if not in, add it' bit of code is convenient for data structures) and partly on which particular idiom feels natural to you.

For the 'if not in, add it' case one can often use the dict .setdefault() method to shorten the code:

def accum(k, v):
    store.setdefault(k, []).append(v)

(Opinions may be divided on whether this is uglier and more complicated in practice than the more verbose version.)

As it happens, I have to remind myself of .setdefault() every so often, and I've seen other people miss it too. I'm not sure why .setdefault() keeps slipping out of my mind; it may partly be because it has such an odd name for the operation it does, although I have to admit that coming up with a better one would be a challenge.

There is at least one case where .setdefault() is clearly worse. Consider:

def accum(k, v):
    if k not in store:
        store[k] = SomeClass()
    store[k].op(v)

If you wrote this with .setdefault(), you would be creating and then throwing away a SomeClass object every time the key had already been seen before, churning memory in the process. The more verbose code avoids this by only creating SomeClass objects when you actually need them.

AccumulatorSetdefault written at 00:39:02; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.