2023-02-20
A bit on unspecified unique objects in Python
In Why Aren't Programming Language Specifications Comprehensive? (via), Laurence Tratt shows the following example of a difference in behavior between CPython and PyPy:
$ cat diffs.py print(str(0) is str(0)) $ python3 diffs.py False $ pypy diffs.py True
Tratt notes that Python's language specification doesn't specify
the behavior here, so both implementations are correct. Python does
this to preserve the ability of implementations to make different
choices, and Tratt goes on to use the example of __del__
destructors.
This might leave a reader who is willing to accept the difference in
destructor behavior to wonder why Python doesn't standardize object
identity here.
Since this code uses 'is
', the underlying reason for the difference
in behavior is whether two invocations of 'str(0)
' in one expression
result in the same actual object. In CPython 3, they don't; in PyPy,
they do. On the one hand, making these two invocations create the
same object is an obvious win, since you're creating less objects
and thus less garbage. A Python implementation could do this by
knowing that using str() on a constant results in a constant result
so it only needs one object, or it could intern the
results of expressions like 'str(0)' so that they always return the
same object regardless of where they're invoked. So allowing this
behavior is good for Python environments that want to be nicely
optimized, as PyPy does.
On the other hand, doing either of these things (or some combination of them) is extra work and complexity in an implementation. Depending on the path taken to this optimization, you have to either decide what to intern and when, then keep track of it all, or build in knowledge about the behavior of the built in str() and then verify at execution time that you're using the builtin instead of some clever person's other version of str(). Creating a different str() function or class here would be unusual but it's allowed in Python, so an implementation has to support it. You can do this analysis, but it's extra work. So not requiring this behavior is good for implementations that don't want to have the code and take the (extra) time to carefully do this analysis.
This is of course an example of a general case. Languages often
want to allow but not require optimizations, even when these
optimizations can change the observed behavior of programs (as they
do here). To allow this, careful language specifications set up
explicit areas where the behavior isn't fixed, as Python does
with is
(see the footnote).
In fact, famously CPython doesn't even treat all types of objects
the same:
$ cat diff2.py print(int('0') is int('0')) $ python3 diff2.py True $ pypy diff2.py True
Simply changing the type of object changes the behavior of CPython. For that matter, how we create the object can change the behavior too:
$ cat diff3.py print(chr(48) == str(0)) print(chr(48) is chr(48)) print(chr(48) is str(0)) $ python3 diff3.py True True False
Both 'chr(48)' and 'str(0)' create the same string value, but only one of them results in the same object being returned by multiple calls. All of this is due to CPython's choices about what it optimizes and what it doesn't. These choices are implementation specific and also can change over time, as the implementation's views change (which is to say as the views of CPython's developers change).