Why I don't like Python 3 dropping the comparison function for sorting

December 30, 2011

One of the changes that Python 3 has made is that, to quote the documentation:

builtin.sorted() and list.sort() no longer accept the cmp argument providing a comparison function. Use the key argument instead. [...]

I feel unreasonably annoyed about this change. At least on the surface there's no obvious reason why; basically all of the uses of a comparison function I've ever used are to pick a specific field out, and that's handled much better by the key argument. However, I've recently figured out what irritates me about this: it couples data and behavior too closely.

In the new world, there are three ways to create a sort ordering. If your ordering depends on explicit fields (possibly modified), you can use a straightforward key function. If the ordering of a data element is strictly computable from a single element (for example, a 'distance' metric that's easy to determine), you can use a key function which synthetically computes an element's ordering and returns it. And if neither of these holds and you can only really determine a relative ordering, you can define a __lt__ method on your objects.

The problem with the last approach is that, of course, you can only have one __lt__ method and thus only one sort ordering. What's happened is that you've been forced to couple the raw data with the behavior of a particular sort ordering. Getting around this requires various hacks, such as synthetic wrapper objects with different __lt__ functions.

(The other problem is that your data needs to be actual objects. While this is usually the case for anything complex enough that you only can do a relative ordering, sometimes you're getting the data from an outside source and it would be handy to leave it in its native form.)

While this is only a theoretical concern for me, it still irritates me a bit that Python 3 has chosen to move towards closer, less flexible coupling between data and ordering. I maintain that the two are separate and we can see this in the fact that there are many possible orderings for complex data depending on what you want to do with it.

By the way, I can see several reasons why Python 3 did this and I sympathize with them (even if I still don't like dropping cmp). The Python 3 documentation notes that key is more efficient since it's called only once per object you're sorting. On top of that, it's relatively easy to make mistakes with complex cmp functions that create inconsistent ordering, which potentially causes sorting algorithms to malfunction mysteriously.

Written on 30 December 2011.
« An empirical exploration of whether spammers take Christmas off
My thoughts on the mockist versus classicalist testing approaches »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Dec 30 02:03:10 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.