More on equality in Python (well, mostly Python 2.7)
There are undoubtedly some languages where if you create a new class,
instances of the class don't support equality checks at all until you
tell the language how to do this. Python is friendlier, so it gives all
classes a default idea of ordinary equality,
the equality of ==
. This default is object identity; for an
uncustomized class, 'a == b
' is the same as 'a is b
'. This is kind
of minimal but it's hard to argue that an object is not equal to itself.
(There are some situations where this is not true, or at least not desired, but we won't go there for now.)
Python has two ways of customizing equality checks, the old one and the
new one. The old one is to create a general __cmp__
method that
returns the same sort of result as the cmp()
builtin does. The new one
is to define __eq__
and __ne__
special methods, which are called
for ==
and !=
comparisons respectively. Note that you need to define
both special methods; Python will not infer one from the other and will
instead just default to object identity.
(Personally I disagree with this lack of inference. Defining __eq__
but not __ne__
is both an easy mistake to make and a great way to
shoot yourself in the foot in a far from obvious way.)
Discussions of __eq__
and __ne__
generally lump them in with
the other comparison special methods (this happens even in the official
documentation) and some also imply that you should implement all of
these special methods if you have any. In my opinion this is a mistake;
there are plenty of situations where equality and inequality between
entities is well defined but you do not necessarily have an order (or
at least any order that you make up will be either arbitrary or very
complex or both). For a concrete example, consider sets. Equality and
inequality of sets is clear and obvious (it's whether they contain the
same elements), but what does it mean to compare two unrelated sets to
see if one is greater than or less than than the other? And how often
are such comparisons going to actually be useful?
(Even if you can handwave an ordering, there is such a thing as being too clever.)
It's my strong opinion that you should not define comparison operators
unless you can actually define an ordering for your class and make that
ordering coherent with respect to equality. Let's use sets again. If
we take 's1 > s2
' to mean 's1 is bigger (has more elements) than
s2', then we can come up with a straightforward implementation of the
comparison methods. But now we have a problem, because we've created a
situation where it's possible for 's1 < s2
', 's1 > s2
', and 's1 ==
s2
' to all be False at once (s1 and s2 have the same number of elements
but the elements are different). This is at least very odd and may have
consequences for other code.
In Python it's legitimate to define __eq__
without defining
__hash__
to have a custom hash function; 'a == b
' does not imply
or require that 'hash(a) == hash(b)
'. There are many classes where it
will not, for instance pretty much any sort of mutable thing.
Sidebar: default comparisons
In Python 2.x, class instances have a default idea of ordering that is
based on id()
; in an uncustomized class, 'a < b
' means 'id(a) <
id(b)
'. This gives you a comparison result and allows you to do things
like sort class instances into a consistent and stable ordering (at
least in CPython), but it's not actually very meaningful (since id()
is just the address in memory of the (C-level) instance object, which is
highly unpredictable). In Python 3, this default has been eliminated and
attempts to compare uncustomized class instances this way will get you
a TypeError about unorderable types.
Note that there is no general guarantee that the id()
results for
a given object are stable over time. They are stable in CPython
mostly because CPython does not relocate objects when it does garbage
collection. Some other Python implementations do have relocating garbage
collectors and so in them, 'id(obj)
' may well change over time.
(The id()
help text is very explicit that the id()
result only has
to be unique among all currently existing objects right now. Even in
CPython, the id()
of a newly created object may be the same as the old
id()
of an old and now garbage collected object. In short, id()
is
not a unique serial number for objects.)
|
|