Why I don't like Python 3 dropping the comparison function for sorting

December 30, 2011

One of the changes that Python 3 has made is that, to quote the documentation:

builtin.sorted() and list.sort() no longer accept the cmp argument providing a comparison function. Use the key argument instead. [...]

I feel unreasonably annoyed about this change. At least on the surface there's no obvious reason why; basically all of the uses of a comparison function I've ever used are to pick a specific field out, and that's handled much better by the key argument. However, I've recently figured out what irritates me about this: it couples data and behavior too closely.

In the new world, there are three ways to create a sort ordering. If your ordering depends on explicit fields (possibly modified), you can use a straightforward key function. If the ordering of a data element is strictly computable from a single element (for example, a 'distance' metric that's easy to determine), you can use a key function which synthetically computes an element's ordering and returns it. And if neither of these holds and you can only really determine a relative ordering, you can define a __lt__ method on your objects.

The problem with the last approach is that, of course, you can only have one __lt__ method and thus only one sort ordering. What's happened is that you've been forced to couple the raw data with the behavior of a particular sort ordering. Getting around this requires various hacks, such as synthetic wrapper objects with different __lt__ functions.

(The other problem is that your data needs to be actual objects. While this is usually the case for anything complex enough that you only can do a relative ordering, sometimes you're getting the data from an outside source and it would be handy to leave it in its native form.)

While this is only a theoretical concern for me, it still irritates me a bit that Python 3 has chosen to move towards closer, less flexible coupling between data and ordering. I maintain that the two are separate and we can see this in the fact that there are many possible orderings for complex data depending on what you want to do with it.

By the way, I can see several reasons why Python 3 did this and I sympathize with them (even if I still don't like dropping cmp). The Python 3 documentation notes that key is more efficient since it's called only once per object you're sorting. On top of that, it's relatively easy to make mistakes with complex cmp functions that create inconsistent ordering, which potentially causes sorting algorithms to malfunction mysteriously.

Comments on this page:

From at 2011-12-31 20:48:50:

I challenge you to find a case in any python code you have lying around where you can't replace a cmp= argument to sort with a key=. (That's a good python style check in any case - it's amazing how much faster using key= than cmp= can be, and it will generally make your code significantly more readable)

While I still believe that it's possible such a case could be constructed, I'm having trouble doing so at the moment. And even if I were to see such a thing, I don't think this wrapper is really such a great burden:

def sort_cmp(s, cmpfn):
  class Tmp(object):
    def __init__(self, selem):
      self.elem = selem
    def __lt__(self, other):
      return 0 > cmpfn(self.elem, other.elem)
  return sort(s, key=Tmp)

I am not overly pleased with every python 3 change, but this one seems a strange one to get hung up on.

-- DanielMartin

From at 2011-12-31 21:11:49:

Okay, posting that comment triggered Murphy's law and my brain immediately unjammed itself and came up with a few different possible situations in which a cmp function is not overly convoluted to begin with, and yet a simple key function is elusive. (e.g.: sort by .stringA in reverse alphabetical order, then by .stringB in normal alphabetical order)

Still, in those rare cases I don't find a wrapper like the one above overly burdensome. (or a key function made with a revorder wrapper that returns an object that compares backwards)

From at 2012-01-07 18:55:26:

Can't you use a property?

Written on 30 December 2011.
« An empirical exploration of whether spammers take Christmas off
My thoughts on the mockist versus classicalist testing approaches »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Dec 30 02:03:10 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.