== An interesting excursion with Python strings and _is_ Let's start with the following surprising interactive example from [[@xlerb's tweet https://twitter.com/xlerb/status/573253877337747456]]: .pn prewrap on > >>> "foo" is ("fo" + "o") > True > >>> "foo" is ("fo".__add__("o")) > False > >>> "foo" == ("fo".__add__("o")) > True The last two case aren't surprising at all; they demonstrate that equality is bigger than mere object identity, which is what _is_ tests (as I described in my entry on [[Python's two versions of equality TwoEqualitiesAndHash]]). The surprising case is the first one; why do the two sides of that result in exactly the same object? There turn out to be two things going on here, both of them quite interesting. The first thing going on is that CPython does constant folding on string concatenation as part of creating bytecode. This means that the '_"fo" + "o"_' turns into a literal _"foo"_ in the actual bytecodes that are executed. On the surface, this is enough to explain the check succeeding in some contexts. To make life simpler while simultaneously going further down the rabbit hole, consider a function like the following: > def f(): > return "foo" is ("fo"+"o") Compiled functions have (among other things) a table of strings and other constants used in the function. Given constant folding and an obvious optimization, you would expect _"foo"_ to appear in this table exactly once. Well, actually, that's wrong; here's what ((func_code.co_consts)) is for this function in Python 2: > (None, 'foo', 'fo', 'o', 'foo') (It's the same in Python 3, but now it's in ((__code__.co_consts)).) Given this we can sort of see what happened. Probably the bytecode was originally compiled without constant folding and then a later pass optimized the string concatenation away and added the folded version to ((co_consts)), operating on the entirely rational assumption that it didn't duplicate anything already there. This would be a natural fit for a simple peephole optimizer, which is in fact exactly what we find in Python/peephole.c in the CPython 2 source code. But how does this give us object identity? The answer has to be that CPython [[interns https://en.wikipedia.org/wiki/String_interning]] at least some of the literal strings used in CPython code. In fact, if we check ((func_code.co_consts)) for our function up above, we can see that both _"foo"_ strings are in fact already the same object even though there's two entries in ((co_consts)). The effect is actually fairly strong; for example, the same literal string as in two different modules can be interned to be the same object. I haven't been able to find the CPython code that actually does this, so I can't tell you what the exact conditions are. (Whether or not a literal string is interned appears to depend partly on whether or not it has spaces in it. This rabbit hole goes a long way down.) PS: I believe that this means I was wrong about some things I said in [[my entry on instance dictionaries and attribute names InstanceStringUsage]], in that more things get interned than I thought back then. Or maybe CPython grew more string interning optimizations since then.