An interesting issue around using is
with a literal in Python
For reasons outside of the scope of this entry, I recently installed Ubuntu's package (for Ubuntu 20.04) of the Python netaddr on a system of ours. When I did, I got an interesting Python warning that I hadn't seen before:
SyntaxWarning: "is not" with a literal. Did you mean "!="?
I was curious enough to look up the code in question, which boils down to something that looked like this:
def int_to_bits(int_val, word_size, num_words, word_sep=''): [...] if word_sep is not '': [...]
(The current code
replaces this with a '!=
' comparison, which is what the other
similar code in that file uses. Ubuntu being Ubuntu, they will
probably never update or fix the 20.04 'python3-netaddr' package.)
The intention of this code is clear; it wants to check if you supplied
your own word_sep argument. On the one hand, using 'is not
'
here is not the correct thing to do. When you use 'is not
' this way
you need to have a sentinel object, not a sentinel value, and this
code uses the value ''
, the empty string. On the other hand, this
code actually works, for at least three reasons. One of them might be
slightly surprising.
The first reason the code works is mechanical, because I left out
the body of the if
and the rest of the code that actually uses
word_sep. Here is the almost full code:
if word_sep is not '': if not _is_str(word_sep): raise ValueError(...) return word_sep.join(bit_words)
So the only thing the code does differently if it thinks that it has a
non-default word_sep is check that it really is a string. Since the
empty string passes that check, everything is fine. Given this, the if
isn't all that necessary; you could just as well always check to see
that word_sep is a string. However this first reason is specific
to the code itself.
The second and third reasons are general, and would happen regardless
of what use the code made of word_sep and what it did in the if
.
I'll start by presenting the second reason in illustrated form:
>>> def a(b=''): ... return b is not '' ... <stdin>:2: SyntaxWarning: "is not" with a literal. Did you mean "!="? >>> a() False >>> a(b='') False
In CPython, a number of specific strings and other (immutable)
values are what is called interned. Regardless of
how many times they're used in different places all over your Python
code, there's only ever one instance of these values. For instance,
there is only one instance of an empty tuple, '()
', and only one
instance of many small integers. Integers are especially useful to
illustrate this vividly, because you can manipulate current ones
to create new values:
>>> a = 10 >>> b = 5 >>> c = 4 >>> (b+c+1) is a True
If you change a
to be 300 and b
to be 295, this will be False
(as of Python 3.8.7).
The empty string, ''
, is one of those interned (string) values. All
copies of the empty string are the same objects, regardless of where
they come from. Because they're the same object, you can use 'is not
'
(and 'is
') to compare values to them and it will always work. This is
of course not guaranteed by the language specification or by CPython,
but it's such a fundamental optimization that it would be very unusual
if it ever stopped being the case. Still, you should use '!=
' and not
be so tricky.
The third reason is best presented in illustrated form again:
>>> def a(b=3000): ... return b is 3000 [...] >>> a() True >>> a(b=3000) False
This is another CPython optimization, but it's an optimization
within a single function. When CPython is generating the bytecode
for a function it's smart enough to only keep one copy of every
constant value, and this merging of constants includes the default
arguments. So within the a
function, the integer '3000' of the
b
default value and the integer literal '3000' from the code are
the same object and 'is
' will tell you this. However, an integer
of '3000' that comes from the outside is a different object (since
3000 is a large enough integer that Python doesn't intern it).
This optimization is probably going to stay in CPython, but I would
strongly suggest that you not take advantage of it in your code. Just do
as the warning says and don't use 'is
' or 'is not
' on literals. The
very slight performance improvement you might get from exploiting this
isn't worth the confusion you're going to create.
|
|