Python dictionaries and floating point NaNs as keys

November 19, 2022

Like Go, Python's floating point numbers support NaNs with the usual IEEE-754 semantics, including not comparing equal to each other. Since Python will conveniently produce them for us, we can easily demonstrate this:

>>> k = float('nan')
>>> k == k
False
>>> k is k
True

Yesterday, I discovered that Go couldn't delete 'NaN' keys from maps (the Go version of dicts). If you initially try this in Python, it may look like it works:

>>> d = {k: "Help"}
>>> d
{nan: 'Help'}
>>> d[k]
'Help'
>>> del d[k]
>>>

However, all is not what it seems:

>>> d = {k: "Help", float('nan'): "Me"}
>>> d
{nan: 'Help', nan: 'Me'}
>>> d[float('nan')]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: nan

What's going on here is that Python dict indexing has a fast path for object identity, which comes into play when you look up something using exactly the same object that you used to set an entry. When you set a dict entry, Python saves the object you used as the key. If you ask a dict to look up an entry using that exact object, Python doesn't even bother calling the object's equality operation (what would be used for an '==' check); it just returns the value. This means that floating point NaNs have no chance to object that they're never equal to each other, and lookup will succeed. However, if you use a different object that is also a NaN, the lookup will fail because two NaNs never compare equal to each other.

This use of object identity in dict lookups does mean that the Python equivalent of iterating a Go map will always work:

>>> for k in d.keys():
...   d[k]
...
'Help'
'Me'

When you ask a dictionary for its keys, you of course get the literal Python objects that are the keys, which can always be used to look up the corresponding entry in the dict even if they're NaNs or otherwise uncomparable or inequal under normal circumstances.

One of the other things that this starts to show us is that Python is not making any attempt to intern NaNs, unlike things like True, False, and small integers. Let's show that more thoroughly:

>>> k2 = float('nan')
>>> k is k2
False
>>> k is math.nan
False

It might be hard to make all NaNs generated through floating point operations be the same interned object, but it would be relatively straightforward to make 'float("nan")' always produce the same Python object and for that Python object to also be math.nan. But Python doesn't do either of those; every NaN is a unique object. Personally I think that this is the right choice (whether or not it's deliberate); NaNs are supposed to all be different from each other anyway, so using separate objects is slightly better.

(I suspect that Python doesn't intern any floating point numbers, but I haven't checked the source code. On a quick check it doesn't intern 0.0 or +Inf; I didn't try any others. In general, I expect that interning floating point numbers makes much less sense and would result in much less object reuse than interning small integers and so on does.)


Comments on this page:

Interning NaN would come at some cost because there isn’t just one of them: any floating point value with an all-1s exponent is a NaN.

I was vaguely recollecting that there might be many positive and negative infinity values too, but looking it up just now to refresh my memory, it turns out those are represented as two very specific NaNs, namely the two NaNs with an all-0s fractional part.

For 0.0, any value with an all-0s fractional part and at least one 0 bit in the exponent could be a valid representation; It’s not clear to me at a glance whether that’s true or the non-canonical representations are errors (much less whether any operation would ever produce such a value in practice, even if it were accepted as input).

If positive and negative infinity are all you can intern without twiddling the value, that seems… rather too niche to justify the effort. Even if 0.0 can also be included it just doesn’t feel remotely worth bothering.

Written on 19 November 2022.
« Go 1.21 may have a clear(x) builtin and there's an interesting reason why
Floating point NaNs as map keys in Go give you weird results »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Nov 19 22:12:29 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.