The place of the 'is
' syntax in Python
Over on Twitter, I said:
A Python cold take (given how long it's taken me to arrive at it): 'is' should not be a keyword, it should be a built-in function that you're discouraged from using unless you really know what you're doing. As a keyword it's too tempting.
Python has two versions of equality, in ==
,
which is plain equality, and is
, which is object identity; 'a is b
'
is true if and only if a
and b
refer to the same object. Since the
distinction between names and values is fundamental
to Python, we definitely need a way of testing this (for example, to
explore a puzzling mistake I once made).
However, I'm not so sure it should be a language keyword.
The issue with 'is
' as a language keyword is that it makes using
object identity temptingly easy; after all, there's a keyword for
it, part of the language syntax. It's as if you're supposed to use
it. The first problem with this is simply that object identity is
a relatively advanced Python concept, one that's a bit tricky to
get your head around. Python code that genuinely needs to use is
instead of ==
is almost invariably doing something tricky, and
we should generally avoid inviting people to routinely write code
that at least looks like tricky code. The second problem is that
in practice object identity can be tricky because Python implementations
(especially CPython) can quietly make objects be the same thing
(and thus 'a is b
' will be true) when you didn't expect them to
be. It's possible to write safe code that uses 'is
', but you need
to know a fair bit about what you're doing; perfectly sensible
looking code can conceal subtle bugs.
(When Python will give you the same object for two apparently different things depends on the specific version of (C)Python and also sometimes the exact way that you created the objects. It can get quite weird and involved.)
There are at least two reasons I can think of to still have is
as a
keyword. The first is that as a keyword, what it does is guaranteed
by the language and is not subject to being modified by people who
play games with namespaces in the way that, say, isinstance()
can
be changed. Changing what isinstance()
does by defining your own
version is probably a terrible idea, but you can do it if you feel the
urge. Meanwhile, is
is beyond the reach of anything but bytecode
rewriting. The second is that because is
is part of the language and
isn't subject to being changed, it can be implemented in a way that
makes it faster than a built-in function. Built-in functions need to
go through a global name lookup when they're used,
just in case, while is
can be just done directly since it's part of
the language.
(Local variables are fast because they avoid this lookup.)
PS: Of course by now all of this is entirely theoretical. It's
entirely too late for Python to drop 'is
' as a keyword, and even
thinking about it is a bit silly. But I apparently twitch a bit
when I see 'is
' casually used in code examples, and that's sort
of what inspired the tweet that led to this entry.
Comments on this page:
|
|