Wandering Thoughts archives

2020-02-04

What 'is' translates to in CPython bytecode

The main implementation of Python, usually called CPython , translates Python source code into bytecode before interpreting it. How this translation happens can make some things fast, such as how local variables are implemented. When I wrote in yesterday's entry that having 'is' as a keyword can make it faster than if it was a built-in function because as a keyword it doesn't have to be looked up all the time just in case you changed it, I wondered how CPython actually translated 'a is b' to bytecode. The answer turns out to be somewhat more interesting than I expected.

(Bytecode can be most conveniently inspected with the dis module, and the module's documentation helpfully explains a fair bit about what the disassembled representation means.)

Let's define a little function:

def f(a):
   return a is 10

Now we can disassemble this with 'dis.dis(f.__code__)' and get:

2   0 LOAD_FAST      0 (a)
    2 LOAD_CONST     1 (10)
    4 COMPARE_OP     8 (is)
    6 RETURN_VALUE

CPython bytecodes can have an auxiliary value associated with them (shown here as the rightmost column, along with their meaning for the particular bytecode operation). Rather than have separate bytecodes for different comparison operators, all comparisons are implemented with a single bytecode, COMPARE_OP, that picks which comparison to do based on the auxiliary value. The 'is' comparison is just the same as any other; if we used 'return a > 10' in our function, the only difference in the bytecode would be the auxiliary value for COMPARE_OP (it would become 4 instead of 8).

The next obvious question to ask is how 'is not' is implemented, and the answer is that it's another comparison type. If we change our function to use 'is not', the only change is this:

    4 COMPARE_OP     9 (is not)

CPython has one last trick up its sleeve. If we write 'not a is 10', CPython specifically recognizes this and rather than translating it as a COMPARE_OP followed by a UNARY_NOT, translates it straight into the 'is not' comparison. This isn't a general transformation, for various reasons; 'return not a > 10' won't be similarly translated to the bytecode equivalent of 'return a <= 10'.

(CPython does go the extra distance to translate 'not a is not 10' into 'a is 10'. I'm a little bit surprised, since I wouldn't expect people to write that very often.)

PS: One advantage of 'is' being a keyword is that it allows CPython to do this transformation, since CPython always knows what 'is' does here. It wouldn't be safe to transform a hypothetical 'not isidentity(a, 10)' in the same way, since what isidentity does could always be changed by rebinding the name.

python/IsCPythonBytecode written at 21:08:03; Add Comment

The place of the 'is' syntax in Python

Over on Twitter, I said:

A Python cold take (given how long it's taken me to arrive at it): 'is' should not be a keyword, it should be a built-in function that you're discouraged from using unless you really know what you're doing. As a keyword it's too tempting.

Python has two versions of equality, in ==, which is plain equality, and is, which is object identity; 'a is b' is true if and only if a and b refer to the same object. Since the distinction between names and values is fundamental to Python, we definitely need a way of testing this (for example, to explore a puzzling mistake I once made). However, I'm not so sure it should be a language keyword.

The issue with 'is' as a language keyword is that it makes using object identity temptingly easy; after all, there's a keyword for it, part of the language syntax. It's as if you're supposed to use it. The first problem with this is simply that object identity is a relatively advanced Python concept, one that's a bit tricky to get your head around. Python code that genuinely needs to use is instead of == is almost invariably doing something tricky, and we should generally avoid inviting people to routinely write code that at least looks like tricky code. The second problem is that in practice object identity can be tricky because Python implementations (especially CPython) can quietly make objects be the same thing (and thus 'a is b' will be true) when you didn't expect them to be. It's possible to write safe code that uses 'is', but you need to know a fair bit about what you're doing; perfectly sensible looking code can conceal subtle bugs.

(When Python will give you the same object for two apparently different things depends on the specific version of (C)Python and also sometimes the exact way that you created the objects. It can get quite weird and involved.)

There are at least two reasons I can think of to still have is as a keyword. The first is that as a keyword, what it does is guaranteed by the language and is not subject to being modified by people who play games with namespaces in the way that, say, isinstance() can be changed. Changing what isinstance() does by defining your own version is probably a terrible idea, but you can do it if you feel the urge. Meanwhile, is is beyond the reach of anything but bytecode rewriting. The second is that because is is part of the language and isn't subject to being changed, it can be implemented in a way that makes it faster than a built-in function. Built-in functions need to go through a global name lookup when they're used, just in case, while is can be just done directly since it's part of the language.

(Local variables are fast because they avoid this lookup.)

PS: Of course by now all of this is entirely theoretical. It's entirely too late for Python to drop 'is' as a keyword, and even thinking about it is a bit silly. But I apparently twitch a bit when I see 'is' casually used in code examples, and that's sort of what inspired the tweet that led to this entry.

python/IsSyntaxPlace written at 00:28:49; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.