is' translates to in CPython bytecode
The main implementation of Python, usually called CPython , translates Python source code into bytecode
before interpreting it. How this translation happens can make
some things fast, such as how local variables are implemented. When I wrote in yesterday's entry that having '
is' as a keyword can make it
faster than if it was a built-in function because as a keyword
it doesn't have to be looked up all the time just in case you
changed it, I wondered how CPython actually translated '
a is b'
to bytecode. The answer turns out to be somewhat more interesting
than I expected.
(Bytecode can be most conveniently inspected with the
module, and the
module's documentation helpfully explains a fair bit about what
the disassembled representation means.)
Let's define a little function:
def f(a): return a is 10
Now we can disassemble this with '
2 0 LOAD_FAST 0 (a) 2 LOAD_CONST 1 (10) 4 COMPARE_OP 8 (is) 6 RETURN_VALUE
CPython bytecodes can have an auxiliary value associated with them
(shown here as the rightmost column, along with their meaning for
the particular bytecode operation). Rather than have separate
bytecodes for different comparison operators, all comparisons are
implemented with a single bytecode,
that picks which comparison to do based on the auxiliary value.
is' comparison is just the same as any other; if we used
return a > 10' in our function, the only difference in the
bytecode would be the auxiliary value for
COMPARE_OP (it would
become 4 instead of 8).
The next obvious question to ask is how '
is not' is implemented,
and the answer is that it's another comparison type. If we change
our function to use '
is not', the only change is this:
4 COMPARE_OP 9 (is not)
CPython has one last trick up its sleeve. If we write '
not a is
10', CPython specifically recognizes this and rather than translating
it as a
COMPARE_OP followed by a
translates it straight into the '
is not' comparison. This isn't
a general transformation, for various reasons; '
return not a >
10' won't be similarly translated to the bytecode equivalent of
return a <= 10'.
(CPython does go the extra distance to translate '
not a is not 10'
a is 10'. I'm a little bit surprised, since I wouldn't expect
people to write that very often.)
PS: One advantage of '
is' being a keyword is that it allows CPython
to do this transformation, since CPython always knows what '
does here. It wouldn't be safe to transform a hypothetical '
isidentity(a, 10)' in the same way, since what
could always be changed by rebinding the name.