What 'is' translates to in CPython bytecode

February 4, 2020

The main implementation of Python, usually called CPython , translates Python source code into bytecode before interpreting it. How this translation happens can make some things fast, such as how local variables are implemented. When I wrote in yesterday's entry that having 'is' as a keyword can make it faster than if it was a built-in function because as a keyword it doesn't have to be looked up all the time just in case you changed it, I wondered how CPython actually translated 'a is b' to bytecode. The answer turns out to be somewhat more interesting than I expected.

(Bytecode can be most conveniently inspected with the dis module, and the module's documentation helpfully explains a fair bit about what the disassembled representation means.)

Let's define a little function:

def f(a):
   return a is 10

Now we can disassemble this with 'dis.dis(f.__code__)' and get:

2   0 LOAD_FAST      0 (a)
    2 LOAD_CONST     1 (10)
    4 COMPARE_OP     8 (is)

CPython bytecodes can have an auxiliary value associated with them (shown here as the rightmost column, along with their meaning for the particular bytecode operation). Rather than have separate bytecodes for different comparison operators, all comparisons are implemented with a single bytecode, COMPARE_OP, that picks which comparison to do based on the auxiliary value. The 'is' comparison is just the same as any other; if we used 'return a > 10' in our function, the only difference in the bytecode would be the auxiliary value for COMPARE_OP (it would become 4 instead of 8).

The next obvious question to ask is how 'is not' is implemented, and the answer is that it's another comparison type. If we change our function to use 'is not', the only change is this:

    4 COMPARE_OP     9 (is not)

CPython has one last trick up its sleeve. If we write 'not a is 10', CPython specifically recognizes this and rather than translating it as a COMPARE_OP followed by a UNARY_NOT, translates it straight into the 'is not' comparison. This isn't a general transformation, for various reasons; 'return not a > 10' won't be similarly translated to the bytecode equivalent of 'return a <= 10'.

(CPython does go the extra distance to translate 'not a is not 10' into 'a is 10'. I'm a little bit surprised, since I wouldn't expect people to write that very often.)

PS: One advantage of 'is' being a keyword is that it allows CPython to do this transformation, since CPython always knows what 'is' does here. It wouldn't be safe to transform a hypothetical 'not isidentity(a, 10)' in the same way, since what isidentity does could always be changed by rebinding the name.

Written on 04 February 2020.
« The place of the 'is' syntax in Python
The drawback of having a dynamic site with a lot of URLs on today's web »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Feb 4 21:08:03 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.