The clarity drawback of allowing comparison functions for sorting

October 23, 2014

I've written before about my unhappiness that Python 3 dropped support for using a comparison function. Well, let me take that back a bit, because I've come around to the idea that there are some real drawbacks to supporting a comparison function here. Not drawbacks in performance (which are comparatively unimportant here) but drawbacks in code clarity.

DWiki's code is sufficiently old that it uses only .sort() cmp functions simply because, well, that's what I had (or at least that's what I was used to). As a result, in two widely scattered spots in different functions its code base contains the following lines:

def func1(...):
    ....
    dl.sort(lambda x,y: cmp(y.timestamp, x.timestamp))
    ....

def func2(...):
    ....
    coms.sort(lambda x,y: cmp(x.time, y.time))
    ....

Apart from the field name, did you see the difference there? I didn't today while I was doing some modernization in DWiki's codebase and converted both of these to the '.sort(key=lambda x: x.FIELD)' form. The difference is that the first is a reverse sort, not a forward sort, because it flips x and y in the cmp().

(This code predates .sort() having a reverse= argument or at least my general awareness and use of it.)

And that's the drawback of allowing or using a sort comparison function: it's not as clear as directly saying what you mean. Small things in the comparison function can have big impacts and they're easy to overlook. By contrast, my intentions and what's going on are clearly spelled out when these things are rewritten into the modern form:

   dl.sort(key=lambda x: x.timestamp, reverse=True)
   coms.sort(key=lambda x: x.time)

Anyone, a future me included, is much less likely to miss the difference in sort order when reading (or skimming) this code.

I now feel that in practice you want to avoid using a comparison function as much as possible even if one exists for exactly this reason. Try very hard to directly say what you mean instead of hiding it inside your cmp function unless there's no way out. A direct corollary of this is that sorting interfaces should try to let you directly express as much as possible instead of forcing you to resort to tricks.

(Note that there are some cases where you must use a comparison function in some form (see especially the second comment).)

PS: I still disagree with Python 3 about removing the cmp argument entirely. It hasn't removed the ability to have custom sort functions; it's just forced you to write a lot more code to enable them and the result is probably even less efficient than before.


Comments on this page:

Is there a reason you're not using operator.attrgetter for the key functions? It's faster than a lambda.

By cks at 2014-10-31 00:14:38:

Belatedly: one reason is that I didn't know about operator.attrget until you mentioned it, but another is that I'm not convinced I like it in general except as an optimization. I wrote this up at more length in AttrgetterVsLamba.

Written on 23 October 2014.
« Exim's (log) identifiers are basically unique on a given machine
In Go I've given up and I'm now using standard packages »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Oct 23 00:14:32 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.