Remembering that Python lists can use tuples as the sort keys

July 10, 2018

I was recently moving some old Python 2 code to Python 3 (due to a recent decision). This particular code is sufficiently old that it has (or had) a number of my old Python code habits, and in particular it made repeated use of list .sort() with comparison functions. Python 3 doesn't support this; instead you have to tell .sort() what key to use to sort the list. For a lot of the code the conversion was straightforward and obvious because it was just using a field from the object as the sort key. Then I hit a comparison function that looked like this:

def _pricmp(a, b):
  apri = a.prio or sys.maxint
  bpri = b.prio or sys.maxint
  if apri != bpri:
      return cmp(apri, bpri)
  return cmp(a.totbytes, b.totbytes)

I stared at this with a sinking feeling, because this comparison function wasn't just picking a field, it was expressing logic. Losing complex comparison logic is a long standing concern of mine, so I was worried that I'd finally run into a situation where I would be forced into unpleasant hacks.

Then I remembered something obvious: Python supports sorting on tuples, not just single objects. Sorting on tuples compares the two tuples field by field, so you can easily implement the same sort of tie-breaking secondary comparison that I was doing in _pricmp. So I wrote a simple function to generate the tuple of key fields:

def _prikey(a):
  apri = a.prio or sys.maxint
  return (apri, a.totbytes)

Unsurprisingly, this just worked (including the tie-breaking, which actually comes up fairly often in this particular comparison). It's probably even somewhat clearer, and it certainly avoids some potential comparison function mistakes

(It's also shorter, but that's not necessarily a good thing.)

PS: Python has supported sorting tuples for a long time but I don't usually think about it, so things had to swirl around in my head for a bit before the light dawned about how to solve my issue. There's a certain mental shift that you need to go from 'the key= function retrieves the key field' to 'the key= function creates the sort key, but it's usually a plain field value'.

Comments on this page:

By Phillip at 2018-07-10 01:47:57:

I am pretty sure that sort does not have any special code for tuples. Rather, tuples are orderable (by the inherited order retrieved by comparing their fields) and the thing you've discovered is that there is no need for the key function to return the same type as went into it.

By Clément at 2018-07-10 02:57:29:

I think the move to a tuple-based key is a net improvement. That being said, if you're in the process of porting old code, you should look at ; it takes a comparison function and returns the corresponding key function (it works by creating a custom key type K whose comparison operators are defined so that K(x) < K(y) iff cmp(x, y) < 0).

By Mike at 2018-08-09 02:59:15:

For prio-queues in particular (given the example), you might also find somewhat-obscure heapq module in python useful.

Allows to use any list as such, using heapq.heappush(heap, (prio, value)) to add new element there, heappop() to pull highest-prio one from it and heap[0] to peek, with all the necessary ordering done efficiently and on-the-fly.

Written on 10 July 2018.
« TLS Certificate Authorities and 'trust'
Some thoughts on performance shifts in moving from an iSCSI SAN to local SSDs »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jul 10 00:43:25 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.