Why [], {}, and () are cheap in Python

June 12, 2011

Here's a trick question: are the following two lines semantically equivalent?

a = []
a = list()

In a normal Python environment they certainly have the same effect, but the answer is that no, they are not semantically equivalent because there is always the possibility that the name list has been rebound in the current namespace. As a result, when Python executes the second line it's obliged to go through the bother of looking up list's name to determine that it actually is the built-in. By contrast, [] here can only ever mean one thing in Python; its semantics are fixed and so CPython is free to fast-path things based on that.

(And it does. [] compiles straight down to a specific bytecode operation, BUILD_LIST. list() compiles to a lookup and a function call.)

That's half of the reason that [] is a cheap operation. The other half is that CPython keeps a stash of preallocated list objects ready to be reused. As a result, CPython needs to do no memory allocations if it needs an empty list when this stash is not empty.

(For a non-empty list, it needs to allocate and zero the memory block for the array of pointers to list items.)

Dictionaries have similar optimizations, although the dictionary setup code has slightly more work to do in order to give you a valid empty dictionary.

Tuple creation has even more optimization. CPython keeps a much larger stash of various sized tuples so that it can almost certainly immediately give you, eg, a 4-element tuple without any fuss. Since tuples are immutable, () can be a very special case:

>>> () is ()

In other words, there is only one empty tuple in CPython. Once something creates it, it will live forever. Giving you an empty tuple is as simple as increasing this one empty tuple's reference count and returning it.

(This entry was sparked by a recent comment on this entry; it caused me to get curious about just how efficient things like '[]' were. I had assumed that CPython had optimized them, but why assume when I could go find out.)

Sidebar: how much this matters

; python -m timeit 'a = ()'
10000000 loops, best of 3: 0.0315 usec per loop
; python -m timeit 'a = []'
10000000 loops, best of 3: 0.0573 usec per loop
; python -m timeit 'a = list()'
1000000 loops, best of 3: 0.198 usec per loop

And here's some more, with some interesting results:

; python -m timeit -s 'l = list' 'a = l()'
10000000 loops, best of 3: 0.176 usec per loop
; python -m timeit -s 'class A(object): pass' 'a = A()'
10000000 loops, best of 3: 0.16 usec per loop

If you give the A class a do-nothing __init__, it slows down to around 0.38 usec per loop. I have no idea why 'A()' is so fast without an __init__; before I measured, I would have expected it to be clearly slower than list().

(This is all from a 64-bit Fedora 14 machine with Python 2.7.)

PS: this is where Zed Shaw gets very irritated with me, so don't bet the farm on these timing results.

Written on 12 June 2011.
« Some notes on __slots__ and class hierarchies
Noting some wandering thoughts on the occasion of an anniversary »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jun 12 01:13:05 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.