2011-06-12
Noting some wandering thoughts on the occasion of an anniversary
I'm a peculiar person, and so although I wrote my first entries here on June 11th, I consider June 12th to be WanderingThoughts' real anniversary because it's when I decided to do it again and thus made the first day not a one-off thing. That was six years ago, which in some ways really boggles me.
I'm not someone for anniversaries; I wrote something for the first anniversary, but never bothered noting the others (or other significant numbers, like writing 1,000 entries or 2,000 entries).
I'm a different person than I was six years ago, but then almost everyone is. I don't know how much WanderingThoughts has contributed to the changes, though; most of what I think of as the big changes have happened outside it and have had nothing to do with it. Still, I probably write better and clearer, I'm pretty sure that I both type and write faster than I used to, and I've probably benefited from thinking through various things in the process of writing entries.
And it's been fun (well, enjoyable). That's really why I've done it for six years and counting, that and the fact that I'm terribly stubborn and very set in my habits once they've been established.
Consider this a marker, dropped in the stream of time. I was here, now, looking both forward and backwards.
(If nothing else, looking backwards is one benefit of blogs.)
Why []
, {}
, and ()
are cheap in Python
Here's a trick question: are the following two lines semantically equivalent?
a = [] a = list()
In a normal Python environment they certainly have the same effect, but
the answer is that no, they are not semantically equivalent because
there is always the possibility that the name list
has been rebound
in the current namespace. As a result, when Python executes the second
line it's obliged to go through the bother of looking up list
's name
to determine that it actually is the built-in. By contrast, []
here
can only ever mean one thing in Python; its semantics are fixed and so
CPython is free to fast-path things based on that.
(And it does. []
compiles straight down to a specific bytecode
operation, BUILD_LIST
. list()
compiles to a lookup and a function
call.)
That's half of the reason that []
is a cheap operation. The other half
is that CPython keeps a stash of preallocated list objects ready to be
reused. As a result, CPython needs to do no memory allocations if it
needs an empty list when this stash is not empty.
(For a non-empty list, it needs to allocate and zero the memory block for the array of pointers to list items.)
Dictionaries have similar optimizations, although the dictionary setup code has slightly more work to do in order to give you a valid empty dictionary.
Tuple creation has even more optimization. CPython keeps a much larger
stash of various sized tuples so that it can almost
certainly immediately give you, eg, a 4-element tuple without any fuss.
Since tuples are immutable, ()
can be a very special case:
>>> () is () True
In other words, there is only one empty tuple in CPython. Once something creates it, it will live forever. Giving you an empty tuple is as simple as increasing this one empty tuple's reference count and returning it.
(This entry was sparked by a recent comment on this entry; it caused me to get curious about just how
efficient things like '[]
' were. I had assumed that CPython had
optimized them, but why assume when I could go find out.)
Sidebar: how much this matters
; python -m timeit 'a = ()' 10000000 loops, best of 3: 0.0315 usec per loop ; python -m timeit 'a = []' 10000000 loops, best of 3: 0.0573 usec per loop ; python -m timeit 'a = list()' 1000000 loops, best of 3: 0.198 usec per loop
And here's some more, with some interesting results:
; python -m timeit -s 'l = list' 'a = l()' 10000000 loops, best of 3: 0.176 usec per loop ; python -m timeit -s 'class A(object): pass' 'a = A()' 10000000 loops, best of 3: 0.16 usec per loop
If you give the A class a do-nothing __init__
, it slows down to
around 0.38 usec per loop. I have no idea why 'A()
' is so fast without
an __init__
; before I measured, I would have expected it to be
clearly slower than list()
.
(This is all from a 64-bit Fedora 14 machine with Python 2.7.)
PS: this is where Zed Shaw gets very irritated with me, so don't bet the farm on these timing results.