2012-08-12
A nice illustration of the cost of creating and destroying objects in Python
Here are two functions (they may look familiar):
def xiter(max):
for _ in xrange(max):
pass
from itertools import repeat
def riter(max):
for _ in repeat(None, max):
pass
The first thing that makes these functions interesting is that xiter()
runs about 1.25 times as slowly as riter(), once you test with a
large enough max to drown out startup effects. This is interesting
because these functions are almost identical, both in generated bytecode
and in what happens at runtime. Both xrange() and repeat() create
and return iterator objects, which the for loop then traverses. Both
iterator objects are implemented in C, as are xrange() and repeat()
themselves, and the other operations are the same between both
functions.
In short, the difference between these two functions is how the
two iterator objects behave. And the big difference here is object
allocation and deallocation; repeat()'s iterator returns the same
pre-existing object max times while xrange()'s iterator returns
max different integer objects, most of which are created and then
destroyed. While it's possible that the C code for one is much worse
than the C code for the other, it's reasonable to assume that this 1.25
times performance difference between the two functions is entirely
due to the extra overhead of allocating and deallocating all of those
integer objects. This is about as pure an illustration of the cost
of object creation and deletion as you could ask for.
(Note that the performance difference is not due to the overhead of
having a lot of integer objects around. Only one integer object is alive
at any given time in the xiter() version.)
The CPython bytecode difference between iteration and looping
Because I feel like it, today I'm going to show you the difference in CPython bytecodes between the two variants of 'repeat N times' that I talked about in my entry on the periodic strangeness of idiomatic Python. Both are going to be in functions (because that's the common case of where Python code is).
First, the iteration version:
def iter(max):
for _ in range(max):
pass
The core bytecode of the function looks like the following (note that I have simplified the actual bytecode disassembly to make it easier to read):
0 SETUP_LOOP 1 LOAD_GLOBAL range 2 LOAD_FAST max 3 CALL_FUNCTION 4 GET_ITER 5 FOR_ITER (to 8) 6 STORE_FAST _ 7 JUMP_ABSOLUTE (to 5) 8 POP_BLOCK
Bytecodes 1 through 3 are the initial setup, ie the call to range(max)
and turning the result into an iterator. The actual for loop over the
iterator is three bytecode instructions, 5, 6, and 7; if the loop ran
actual code it would appear between 6 and 7.
(When reading the bytecode, it helps to know that CPython bytecode
is stack based; arguments to operations are put on the stack (eg,
by various LOAD instructions) and then popped off as part of other
operations (eg by STORE instructions). The FOR_ITER bytecode
includes information about where to resume execution when the iterator
is exhausted, in this case at bytecode 8.)
The explicit loop version is:
def loop(max):
i = 0
while i < max:
i += 1
There is a bunch more bytecode here:
0 LOAD_CONST 0 1 STORE_FAST i 2 SETUP_LOOP 3 LOAD_FAST i 4 LOAD_FAST max 5 COMPARE_OP < 6 POP_JUMP_IF_FALSE (to 12) 7 LOAD_FAST i 8 LOAD_CONST 1 9 INPLACE_ADD 10 STORE_FAST i 11 JUMP_ABSOLUTE (to 3) 12 POP_BLOCK
Setup takes only two bytecodes, to initialize i to 0. But the actual
loop involves much more bytecodes; four bytecodes (3-6) are necessary to
handle the loop condition and it takes another four (7-10) to increment
i (and then a final bytecode, 11, to close the loop). So this code is
processing 9 bytecodes every time through the loop, plus it's doing a
bunch more stack manipulation than the iteration loop.
Now, you might sensibly ask if this difference in how many bytecodes
there are makes any real performance difference; after all, an efficient
interpreter can run bytecodes pretty fast. The short answer is that
it does; on a 64-bit Linux machine with Python 2.7.3, the loop based
version is around 3.4 times slower than the iteration based version. I
can make the difference even bigger by changing range() to xrange()
in the iterator version, which avoids a chunk of overhead.
(I timed with a max of 10000, which artificially lowers the setup
overhead and increases the actual loop overhead.)
Does this performance difference matter? Probably not. In real code you're likely to be looping only for a relatively short number of iterations and the real work you're doing on each iteration will probably dwarf the loop overhead.
PS: due to a comment by David B on the original entry, the most efficient way to do this is:
from itertools import repeat
def itools(max):
for _ in repeat(None, max):
pass
This beats even the xrange() based version, and not by a little
bit. Why it is better is an interesting thing to think about.
2012-08-06
The theoretical legality of shadowing builtins in Python
Here is a variant of an example I wrote a few years ago:
eval = eval class A(object): file = file
This creates a module-level eval name binding that is the same as the
builtin eval, and a class variable A.file that is the same as the
builtin file. All of this works in CPython because of how names and
scopes are used in the CPython bytecode.
Which leads to the theoretical question: is this actually 'portable' in
the sense that this behavior is required by the Python specification?
(As a practical matter I think that any alternate Python interpreter will include this behavior, making it portable in practice; I believe that a certain amount of code out there in the world relies on it.)
I will cut to the chase: the real result of this exercise is that the Python language reference is essentially an informal document, not a standard. You can't use it for language lawyering, not only for the pragmatic reasons mentioned above but also because it's not an attempt at a complete formal specification of Python for implementors; it is more an attempt at some sort of semantic description for Python programmers (combined with a grammar). The rest of this entry is an illustration of that.
The place to look for the answer to our question is the Naming and Binding section of the Language Reference (Python 3 version). Having peered into the Python 2 version, as far as I can tell this behavior is ambiguous for module level code but apparently theoretically not correct in class level code. For class level code, the crucial two sentences are:
Each assignment or import statement occurs within a block defined by a class or function definition or at the module level (the top-level code block).
If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block. [...]
The second sentence is only correct in CPython for function code
blocks; it's false for other blocks, as we can see in the example with
class A. The case of module-level code is more ambiguous, because the
same section contains a description of the global statement which
includes:
[...] Names are resolved in the top-level namespace by searching the global namespace, i.e. the namespace of the module containing the code block, and the builtins namespace, [...]
Although this is in a paragraph about global, it's sensible to read
it as a general description of how names are resolved in the top-level
(module) namespace. One reading of this combined with the 'name binding'
sentence allows for module-level rebinding; in 'eval = eval', the
right side eval may be a reference to the version in the module
level block scope but the lookup rules for such references allow you
to find the builtin eval. Another reading is that the two sentences
contradict each other.
By the way, this shows one of the problems with standards in practice: you have to read most actual standards for complex things extremely closely and carefully in order to get the right answers. Doing this is unnatural and hard, even more so than reading Unix manpages; mistakes are easy to make and the consequences potentially significant (and hard to test).
PS: given this view of the language reference, you might wonder why I want it to include a description of the attribute lookup order. My answer is that such a description is useful for a Python programmer, if only to put all of the pieces in one place. By contrast painstaking and nitpicking descriptions of arcane bits of namespace oddness are not so useful.