More on equality in Python (well, mostly Python 2.7)

April 4, 2012

There are undoubtedly some languages where if you create a new class, instances of the class don't support equality checks at all until you tell the language how to do this. Python is friendlier, so it gives all classes a default idea of ordinary equality, the equality of ==. This default is object identity; for an uncustomized class, 'a == b' is the same as 'a is b'. This is kind of minimal but it's hard to argue that an object is not equal to itself.

(There are some situations where this is not true, or at least not desired, but we won't go there for now.)

Python has two ways of customizing equality checks, the old one and the new one. The old one is to create a general __cmp__ method that returns the same sort of result as the cmp() builtin does. The new one is to define __eq__ and __ne__ special methods, which are called for == and != comparisons respectively. Note that you need to define both special methods; Python will not infer one from the other and will instead just default to object identity.

(Personally I disagree with this lack of inference. Defining __eq__ but not __ne__ is both an easy mistake to make and a great way to shoot yourself in the foot in a far from obvious way.)

Discussions of __eq__ and __ne__ generally lump them in with the other comparison special methods (this happens even in the official documentation) and some also imply that you should implement all of these special methods if you have any. In my opinion this is a mistake; there are plenty of situations where equality and inequality between entities is well defined but you do not necessarily have an order (or at least any order that you make up will be either arbitrary or very complex or both). For a concrete example, consider sets. Equality and inequality of sets is clear and obvious (it's whether they contain the same elements), but what does it mean to compare two unrelated sets to see if one is greater than or less than than the other? And how often are such comparisons going to actually be useful?

(Even if you can handwave an ordering, there is such a thing as being too clever.)

It's my strong opinion that you should not define comparison operators unless you can actually define an ordering for your class and make that ordering coherent with respect to equality. Let's use sets again. If we take 's1 > s2' to mean 's1 is bigger (has more elements) than s2', then we can come up with a straightforward implementation of the comparison methods. But now we have a problem, because we've created a situation where it's possible for 's1 < s2', 's1 > s2', and 's1 == s2' to all be False at once (s1 and s2 have the same number of elements but the elements are different). This is at least very odd and may have consequences for other code.

In Python it's legitimate to define __eq__ without defining __hash__ to have a custom hash function; 'a == b' does not imply or require that 'hash(a) == hash(b)'. There are many classes where it will not, for instance pretty much any sort of mutable thing.

Sidebar: default comparisons

In Python 2.x, class instances have a default idea of ordering that is based on id(); in an uncustomized class, 'a < b' means 'id(a) < id(b)'. This gives you a comparison result and allows you to do things like sort class instances into a consistent and stable ordering (at least in CPython), but it's not actually very meaningful (since id() is just the address in memory of the (C-level) instance object, which is highly unpredictable). In Python 3, this default has been eliminated and attempts to compare uncustomized class instances this way will get you a TypeError about unorderable types.

Note that there is no general guarantee that the id() results for a given object are stable over time. They are stable in CPython mostly because CPython does not relocate objects when it does garbage collection. Some other Python implementations do have relocating garbage collectors and so in them, 'id(obj)' may well change over time.

(The id() help text is very explicit that the id() result only has to be unique among all currently existing objects right now. Even in CPython, the id() of a newly created object may be the same as the old id() of an old and now garbage collected object. In short, id() is not a unique serial number for objects.)

Written on 04 April 2012.
« Python's two versions of equality, with a long digression on hash()
Why I hate having /tmp as a tmpfs »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Apr 4 01:16:17 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.