An interesting issue around using is with a literal in Python

February 12, 2021

For reasons outside of the scope of this entry, I recently installed Ubuntu's package (for Ubuntu 20.04) of the Python netaddr on a system of ours. When I did, I got an interesting Python warning that I hadn't seen before:

SyntaxWarning: "is not" with a literal. Did you mean "!="?

I was curious enough to look up the code in question, which boils down to something that looked like this:

def int_to_bits(int_val, word_size, num_words, word_sep=''):
    [...]
    if word_sep is not '':
        [...]

(The current code replaces this with a '!=' comparison, which is what the other similar code in that file uses. Ubuntu being Ubuntu, they will probably never update or fix the 20.04 'python3-netaddr' package.)

The intention of this code is clear; it wants to check if you supplied your own word_sep argument. On the one hand, using 'is not' here is not the correct thing to do. When you use 'is not' this way you need to have a sentinel object, not a sentinel value, and this code uses the value '', the empty string. On the other hand, this code actually works, for at least three reasons. One of them might be slightly surprising.

The first reason the code works is mechanical, because I left out the body of the if and the rest of the code that actually uses word_sep. Here is the almost full code:

if word_sep is not '':
   if not _is_str(word_sep):
       raise ValueError(...)

return word_sep.join(bit_words)

So the only thing the code does differently if it thinks that it has a non-default word_sep is check that it really is a string. Since the empty string passes that check, everything is fine. Given this, the if isn't all that necessary; you could just as well always check to see that word_sep is a string. However this first reason is specific to the code itself.

The second and third reasons are general, and would happen regardless of what use the code made of word_sep and what it did in the if. I'll start by presenting the second reason in illustrated form:

>>> def a(b=''):
...   return b is not ''
...
<stdin>:2: SyntaxWarning: "is not" with a literal. Did you mean "!="?
>>> a()
False
>>> a(b='')
False

In CPython, a number of specific strings and other (immutable) values are what is called interned. Regardless of how many times they're used in different places all over your Python code, there's only ever one instance of these values. For instance, there is only one instance of an empty tuple, '()', and only one instance of many small integers. Integers are especially useful to illustrate this vividly, because you can manipulate current ones to create new values:

>>> a = 10
>>> b = 5
>>> c = 4
>>> (b+c+1) is a
True

If you change a to be 300 and b to be 295, this will be False (as of Python 3.8.7).

The empty string, '', is one of those interned (string) values. All copies of the empty string are the same objects, regardless of where they come from. Because they're the same object, you can use 'is not' (and 'is') to compare values to them and it will always work. This is of course not guaranteed by the language specification or by CPython, but it's such a fundamental optimization that it would be very unusual if it ever stopped being the case. Still, you should use '!=' and not be so tricky.

The third reason is best presented in illustrated form again:

>>> def a(b=3000):
...    return b is 3000
[...]
>>> a()
True
>>> a(b=3000)
False

This is another CPython optimization, but it's an optimization within a single function. When CPython is generating the bytecode for a function it's smart enough to only keep one copy of every constant value, and this merging of constants includes the default arguments. So within the a function, the integer '3000' of the b default value and the integer literal '3000' from the code are the same object and 'is' will tell you this. However, an integer of '3000' that comes from the outside is a different object (since 3000 is a large enough integer that Python doesn't intern it).

This optimization is probably going to stay in CPython, but I would strongly suggest that you not take advantage of it in your code. Just do as the warning says and don't use 'is' or 'is not' on literals. The very slight performance improvement you might get from exploiting this isn't worth the confusion you're going to create.

Written on 12 February 2021.
« Getting high IOPS requires concurrency on modern SSDs and NVMe drives
Where the default values for Python function arguments are stored »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Feb 12 23:11:21 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.