2015-03-06
The simple way CPython does constant folding
When I looked into CPython's constant folding as part of writing yesterday's entry, I expected to find something clever and perhaps intricate, involving examining the types of constants and knowing about special combinations and so on. It turns out that CPython doesn't bother with all of this and it has a much simpler approach.
To do constant folding, CPython basically just calls the same general
code that it would call to do, say, '+' on built in types during
execution. There is no attempt to recognize certain type combinations or
operation combinations and handle them specially; any binary operation
on any suitable constant will get tried. This includes combinations that
will fail (such as '1 + "a"') and combinations that you might not
expect to succeed, such as '"foo %d" % 10'.
(Note that there are some limits on what will get stored as a new
folded constant, including that it's not too long. '"a" * 1000'
won't be constant folded, for example, although '"a" * 5' will.)
What is a suitable constant is both easy and hard to define. The easy
definition is that a suitable constant is anything that winds up in
func_code.co_consts and so is accessed with a LOAD_CONST
bytecode instruction. Roughly speaking this is any immutable basic
type, which I believe currently is essentially integers, strings,
and floating point numbers. In Python 3, tuples of these types will
also be candidates for taking part in constant folding.
At first this approach to constant folding seemed alarming to me, since CPython is calling general purpose evaluation code, code that's normally used much later during bytecode execution. But then I realized that CPython is only doing this in very restricted circumstances; since it's only doing this with a very few types of immutable objects, it knows a lot about what C code is actually getting called here (and this code is part of the core interpreter). The code for these basic types has an implicit requirement that it can also be called as bytecode is being optimized, and the Python authors can make sure this is respected when things change. This would be unreasonable for arbitrary types, even arbitrary C level types, but is perfectly rational here and is beautifully simple.
In short, CPython has avoided the need to write special constant folding evaluation code by making sure that its regular evaluation code for certain basic types can also be used in this situation and then just doing so. In the process it opened up some surprising constant folding opportunities.
(And it can automatically open up more in the future, since anything
that winds up in co_consts is immediately a candidate for constant
folding.)
Sidebar: What happens with tuples in Python 2
In the compiled bytecode, tuples of constant values do not actually
start out as constants; instead they start out as a series of 'load
constant' bytecodes followed by a BUILD_TUPLE instruction. Part of
CPython's peephole optimizations is to transform this sequence into a
new prebuilt constant tuple (and a LOAD_CONST instruction to access
it).
In Python 2, the whole peephole optimizer apparently effectively doesn't
reconsider the instruction stream after doing this optimization. So if
you have '(1, 2) + (3, 4)' you get a first transformation to turn the
two tuples into constants, but CPython never goes on to do constant
folding for the + operation itself; by the time + actually has
two constant operands, it's apparently too late. In Python 3, this
limitation is gone and so the + will be constant folded as well.
(Examining __code__.co_consts in Python 3 shows that the
intermediate step still happens; the co_consts for a function that
just has a 'return' of this is '(None, 1, 2, 3, 4, (1, 2), (3, 4),
(1, 2, 3, 4))', where we see the intermediate tuples being built before
we wind up with the final version. In general constant folding appears
to leave intermediate results around, eg for '10 + 20 + 30 + 40' you
get several intermediate constants as well as 100.)
2015-03-05
An interesting excursion with Python strings and is
Let's start with the following surprising interactive example from @xlerb's tweet:
>>> "foo" is ("fo" + "o")
True
>>> "foo" is ("fo".__add__("o"))
False
>>> "foo" == ("fo".__add__("o"))
True
The last two case aren't surprising at all; they demonstrate that
equality is bigger than mere object identity, which is what is
tests (as I described in my entry on Python's two versions of
equality). The surprising case is the first
one; why do the two sides of that result in exactly the same object?
There turn out to be two things going on here, both of them quite
interesting.
The first thing going on is that CPython does constant folding on
string concatenation as part of creating bytecode. This means that
the '"fo" + "o"' turns into a literal "foo" in the actual
bytecodes that are executed. On the surface, this is enough to
explain the
check succeeding in some contexts. To make life simpler while
simultaneously going further down the rabbit hole, consider a
function like the following:
def f():
return "foo" is ("fo"+"o")
Compiled functions have (among other things) a table of strings and
other constants used in the function. Given constant folding and
an obvious optimization, you would expect "foo" to appear in this
table exactly once. Well, actually, that's wrong; here's what
func_code.co_consts is for this function in Python 2:
(None, 'foo', 'fo', 'o', 'foo')
(It's the same in Python 3, but now it's in __code__.co_consts.)
Given this we can sort of see what happened. Probably the bytecode was
originally compiled without constant folding and then a later pass
optimized the string concatenation away and added the folded version to
co_consts, operating on the entirely rational assumption that it
didn't duplicate anything already there. This would be a natural fit
for a simple peephole optimizer, which is in fact exactly what we find
in Python/peephole.c in the CPython 2 source code.
But how does this give us object identity? The answer has to be
that CPython interns
at least some of the literal strings used in CPython code. In fact,
if we check func_code.co_consts for our function up above, we
can see that both "foo" strings are in fact already the same
object even though there's two entries in co_consts. The effect
is actually fairly strong; for example, the same literal string as
in two different modules can be interned to be the same object.
I haven't been able to find the CPython code that actually does this,
so I can't tell you what the exact conditions are.
(Whether or not a literal string is interned appears to depend partly on whether or not it has spaces in it. This rabbit hole goes a long way down.)
PS: I believe that this means I was wrong about some things I said in my entry on instance dictionaries and attribute names, in that more things get interned than I thought back then. Or maybe CPython grew more string interning optimizations since then.