The place of the 'is' syntax in Python

February 4, 2020

Over on Twitter, I said:

A Python cold take (given how long it's taken me to arrive at it): 'is' should not be a keyword, it should be a built-in function that you're discouraged from using unless you really know what you're doing. As a keyword it's too tempting.

Python has two versions of equality, in ==, which is plain equality, and is, which is object identity; 'a is b' is true if and only if a and b refer to the same object. Since the distinction between names and values is fundamental to Python, we definitely need a way of testing this (for example, to explore a puzzling mistake I once made). However, I'm not so sure it should be a language keyword.

The issue with 'is' as a language keyword is that it makes using object identity temptingly easy; after all, there's a keyword for it, part of the language syntax. It's as if you're supposed to use it. The first problem with this is simply that object identity is a relatively advanced Python concept, one that's a bit tricky to get your head around. Python code that genuinely needs to use is instead of == is almost invariably doing something tricky, and we should generally avoid inviting people to routinely write code that at least looks like tricky code. The second problem is that in practice object identity can be tricky because Python implementations (especially CPython) can quietly make objects be the same thing (and thus 'a is b' will be true) when you didn't expect them to be. It's possible to write safe code that uses 'is', but you need to know a fair bit about what you're doing; perfectly sensible looking code can conceal subtle bugs.

(When Python will give you the same object for two apparently different things depends on the specific version of (C)Python and also sometimes the exact way that you created the objects. It can get quite weird and involved.)

There are at least two reasons I can think of to still have is as a keyword. The first is that as a keyword, what it does is guaranteed by the language and is not subject to being modified by people who play games with namespaces in the way that, say, isinstance() can be changed. Changing what isinstance() does by defining your own version is probably a terrible idea, but you can do it if you feel the urge. Meanwhile, is is beyond the reach of anything but bytecode rewriting. The second is that because is is part of the language and isn't subject to being changed, it can be implemented in a way that makes it faster than a built-in function. Built-in functions need to go through a global name lookup when they're used, just in case, while is can be just done directly since it's part of the language.

(Local variables are fast because they avoid this lookup.)

PS: Of course by now all of this is entirely theoretical. It's entirely too late for Python to drop 'is' as a keyword, and even thinking about it is a bit silly. But I apparently twitch a bit when I see 'is' casually used in code examples, and that's sort of what inspired the tweet that led to this entry.

Written on 04 February 2020.
« What we do to enable us to grow our ZFS pools over time
What 'is' translates to in CPython bytecode »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Feb 4 00:28:49 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.