Wandering Thoughts archives

2022-11-19

Python dictionaries and floating point NaNs as keys

Like Go, Python's floating point numbers support NaNs with the usual IEEE-754 semantics, including not comparing equal to each other. Since Python will conveniently produce them for us, we can easily demonstrate this:

>>> k = float('nan')
>>> k == k
False
>>> k is k
True

Yesterday, I discovered that Go couldn't delete 'NaN' keys from maps (the Go version of dicts). If you initially try this in Python, it may look like it works:

>>> d = {k: "Help"}
>>> d
{nan: 'Help'}
>>> d[k]
'Help'
>>> del d[k]
>>>

However, all is not what it seems:

>>> d = {k: "Help", float('nan'): "Me"}
>>> d
{nan: 'Help', nan: 'Me'}
>>> d[float('nan')]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: nan

What's going on here is that Python dict indexing has a fast path for object identity, which comes into play when you look up something using exactly the same object that you used to set an entry. When you set a dict entry, Python saves the object you used as the key. If you ask a dict to look up an entry using that exact object, Python doesn't even bother calling the object's equality operation (what would be used for an '==' check); it just returns the value. This means that floating point NaNs have no chance to object that they're never equal to each other, and lookup will succeed. However, if you use a different object that is also a NaN, the lookup will fail because two NaNs never compare equal to each other.

This use of object identity in dict lookups does mean that the Python equivalent of iterating a Go map will always work:

>>> for k in d.keys():
...   d[k]
...
'Help'
'Me'

When you ask a dictionary for its keys, you of course get the literal Python objects that are the keys, which can always be used to look up the corresponding entry in the dict even if they're NaNs or otherwise uncomparable or inequal under normal circumstances.

One of the other things that this starts to show us is that Python is not making any attempt to intern NaNs, unlike things like True, False, and small integers. Let's show that more thoroughly:

>>> k2 = float('nan')
>>> k is k2
False
>>> k is math.nan
False

It might be hard to make all NaNs generated through floating point operations be the same interned object, but it would be relatively straightforward to make 'float("nan")' always produce the same Python object and for that Python object to also be math.nan. But Python doesn't do either of those; every NaN is a unique object. Personally I think that this is the right choice (whether or not it's deliberate); NaNs are supposed to all be different from each other anyway, so using separate objects is slightly better.

(I suspect that Python doesn't intern any floating point numbers, but I haven't checked the source code. On a quick check it doesn't intern 0.0 or +Inf; I didn't try any others. In general, I expect that interning floating point numbers makes much less sense and would result in much less object reuse than interning small integers and so on does.)

python/DictsAndNaNKeys written at 22:12:29; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.