Wandering Thoughts archives

2014-10-23

The clarity drawback of allowing comparison functions for sorting

I've written before about my unhappiness that Python 3 dropped support for using a comparison function. Well, let me take that back a bit, because I've come around to the idea that there are some real drawbacks to supporting a comparison function here. Not drawbacks in performance (which are comparatively unimportant here) but drawbacks in code clarity.

DWiki's code is sufficiently old that it uses only .sort() cmp functions simply because, well, that's what I had (or at least that's what I was used to). As a result, in two widely scattered spots in different functions its code base contains the following lines:

def func1(...):
    ....
    dl.sort(lambda x,y: cmp(y.timestamp, x.timestamp))
    ....

def func2(...):
    ....
    coms.sort(lambda x,y: cmp(x.time, y.time))
    ....

Apart from the field name, did you see the difference there? I didn't today while I was doing some modernization in DWiki's codebase and converted both of these to the '.sort(key=lambda x: x.FIELD)' form. The difference is that the first is a reverse sort, not a forward sort, because it flips x and y in the cmp().

(This code predates .sort() having a reverse= argument or at least my general awareness and use of it.)

And that's the drawback of allowing or using a sort comparison function: it's not as clear as directly saying what you mean. Small things in the comparison function can have big impacts and they're easy to overlook. By contrast, my intentions and what's going on are clearly spelled out when these things are rewritten into the modern form:

   dl.sort(key=lambda x: x.timestamp, reverse=True)
   coms.sort(key=lambda x: x.time)

Anyone, a future me included, is much less likely to miss the difference in sort order when reading (or skimming) this code.

I now feel that in practice you want to avoid using a comparison function as much as possible even if one exists for exactly this reason. Try very hard to directly say what you mean instead of hiding it inside your cmp function unless there's no way out. A direct corollary of this is that sorting interfaces should try to let you directly express as much as possible instead of forcing you to resort to tricks.

(Note that there are some cases where you must use a comparison function in some form (see especially the second comment).)

PS: I still disagree with Python 3 about removing the cmp argument entirely. It hasn't removed the ability to have custom sort functions; it's just forced you to write a lot more code to enable them and the result is probably even less efficient than before.

python/SortCmpFunctionClarityIssue written at 00:14:32; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.