A bit on unspecified unique objects in Python

February 20, 2023

In Why Aren't Programming Language Specifications Comprehensive? (via), Laurence Tratt shows the following example of a difference in behavior between CPython and PyPy:

$ cat diffs.py
print(str(0) is str(0))
$ python3 diffs.py
False
$ pypy diffs.py
True

Tratt notes that Python's language specification doesn't specify the behavior here, so both implementations are correct. Python does this to preserve the ability of implementations to make different choices, and Tratt goes on to use the example of __del__ destructors. This might leave a reader who is willing to accept the difference in destructor behavior to wonder why Python doesn't standardize object identity here.

Since this code uses 'is', the underlying reason for the difference in behavior is whether two invocations of 'str(0)' in one expression result in the same actual object. In CPython 3, they don't; in PyPy, they do. On the one hand, making these two invocations create the same object is an obvious win, since you're creating less objects and thus less garbage. A Python implementation could do this by knowing that using str() on a constant results in a constant result so it only needs one object, or it could intern the results of expressions like 'str(0)' so that they always return the same object regardless of where they're invoked. So allowing this behavior is good for Python environments that want to be nicely optimized, as PyPy does.

On the other hand, doing either of these things (or some combination of them) is extra work and complexity in an implementation. Depending on the path taken to this optimization, you have to either decide what to intern and when, then keep track of it all, or build in knowledge about the behavior of the built in str() and then verify at execution time that you're using the builtin instead of some clever person's other version of str(). Creating a different str() function or class here would be unusual but it's allowed in Python, so an implementation has to support it. You can do this analysis, but it's extra work. So not requiring this behavior is good for implementations that don't want to have the code and take the (extra) time to carefully do this analysis.

This is of course an example of a general case. Languages often want to allow but not require optimizations, even when these optimizations can change the observed behavior of programs (as they do here). To allow this, careful language specifications set up explicit areas where the behavior isn't fixed, as Python does with is (see the footnote). In fact, famously CPython doesn't even treat all types of objects the same:

$ cat diff2.py
print(int('0') is int('0'))
$ python3 diff2.py
True
$ pypy diff2.py
True

Simply changing the type of object changes the behavior of CPython. For that matter, how we create the object can change the behavior too:

$ cat diff3.py
print(chr(48) == str(0))
print(chr(48) is chr(48))
print(chr(48) is str(0))
$ python3 diff3.py
True
True
False

Both 'chr(48)' and 'str(0)' create the same string value, but only one of them results in the same object being returned by multiple calls. All of this is due to CPython's choices about what it optimizes and what it doesn't. These choices are implementation specific and also can change over time, as the implementation's views change (which is to say as the views of CPython's developers change).

Written on 20 February 2023.
« Using web server reverse proxying to deal with file access permissions
Grafana Loki doesn't compact log chunks and what this means for you »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Feb 20 23:16:25 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.