Some notes on lifting Python 2 code into Python 3 code

July 23, 2018

We have a set of Python programs that are the core of our ZFS spares handling system. The production versions are written in Python 2 and run on OmniOS on our ZFS fileservers, but we're moving to ZFS-based Linux fileservers, so this code needed a tune-up to cope with the change in environment. As part of our decision to use Python 3 for future tools, I decided to change this code over to Python 3 (partly because I needed to write some completely new Python code to handle Linux device names).

This is not a rewrite or even a port; instead, let's call it lifting code from Python 2 up to Python 3. Mechanically what I did is similar to the first time I did this sort of shift, which is that I changed the '#!/usr/bin/python' at the start of the programs to '#!/usr/bin/python3' and then worked to fix everything that Python 3 complained about. For this code, there have only been a few significant things so far:

  • changing all tabs to spaces, which I did with expand (and I think I overdid it, since I didn't use 'expand -i').

  • changing print statements into print() calls. I learned the hard way to not overlook bare 'print' statements; in Python 2 that produces a newline, while in Python 3 it's still valid but does nothing.

  • converting 'except CLS, VAR:' statements to the modern form, as this code was old enough to have a number of my old Python 2 code habits.

  • taking .sort()s that used comparison functions and figuring out how to creatively generate sort keys that gave the same results. This opened my mind up a bit, although there are still nuances that using sort keys can't easily capture.

  • immediately list()-ifying most calls of adict.keys(), because that particular assumption was all over my code. There were a couple of cases that perhaps I could have deferred the list-ification to later (if at all), but this 'lifting' is intended to be brute force.

    (I didn't list-ify cases where I was clearly immediately iterating, such as 'for ... in d.keys()' or 'avar = [x for ... in d.keys()]'. But any time I assigned .keys() to a name or returned it, it got list-ified.)

  • replace use of optparse with argparse. This wasn't strictly necessary (Python 3 still has optparse), but argparse is the future so I figured I'd fix things while I was working on the code anyway.

Although these tools do have a certain amount of IO, I could get away with relying on Python 3's default character set conversion rules; in practice they should only ever be dealing with ASCII input and output, and if they aren't something has probably gone terribly wrong (eg our ZFS status reporting program has decided to start spraying out binary garbage). This is fairly typical of internal-use system tools but not necessarily of other things, which can expose interesting character set conversion questions.

(My somewhat uninformed view is that character set conversion issues are where moving from Python 2 to Python 3 gets exciting. If you can mostly ignore them, as I could here, you have a much easier time. If you have to consider them, it's probably going to be more porting than just casually lifting the code into Python 3.)

For the most part this 2-to-3 lifting went well and was straightforward. It would have gone better if I had meaningful tests for this code, but I've always had problems writing tests for command line programs (and some of this code is unusually complex to test). I used pyflakes to try to help find Python 3 issues that I'd overlooked; it found some issues but not all of them, and it at least feels less thorough than pychecker used to be. What I would really like is something that's designed to look for lingering Python 2-isms that either definitely don't work in Python 3 or that might be signs of problems, but I suspect that no such tool exists.

(I tried pylint very briefly, but stopped when it had an explosion of gripes with no obvious way to turn off most of them. I don't care about style 'issues' in this code; I want to know about actual problems.)

I'm a bit concerned that there are lingering problems in the code, but this is basically the tradeoff I get to make for taking the approach of 'lifting' instead of 'porting'. Lifting is less work if everything is straightforward and goes well, but it's not as thorough as carefully reading through everything and porting it piece by carefully considered piece (or using tests on everything). I had to stumble over a few .sort()s with comparison functions and un-listified .keys(), especially early on, which has made me conscious that there could be other 2-to-3 issues I just haven't hit in my test usage of the programs. That's one reason I'd like a scanner; it would know what to look for (probably better than I do right now) and as a program, it would look in all of the code's corners.

PS: I remember having a so-so experience with 2to3 many years in the past, but writing this entry got me to see what it did to the Python 2 versions. For the most part it was an okay starting point, but it didn't even flag uses of .sort() with a comparison function and it did significant overkill on list-ifying adict.keys(). Still, reading its proposed diffs just now was interesting. Probably not interesting enough to get me to use it in the future, though.


Comments on this page:

I was a bit confused about a "bare print statement" doing nothing, because print() prints the newline, then I finally realized you meant the former is an expression returning <built-in function print> without side effects.

For character encodings, it's my current model that Python will use some default encoding, and raise an exception if that doesn't work losslessly. Apparently that default depends on the system and environment, but if it's ASCII-compatible like UTF-8 or ISO-8859-1, then ASCII will work fine.

The final point I want to bring up is the peculiarly Pythonic "decorate-sort-undecorate" (aka DSU) idiom. It looks something like:

r = [t[-1] for t in
    sorted((a.prio or sys.maxint, a.totbytes, a)
           for a in athing)]

So working bottom up, we go over the thing to sort, make a tuple of (k1, k2, value) from each item, sort by that tuple, and finally convert the tuple back into the value alone. Making/removing the tuple is the decorate/undecorate part.

I think they missed the "there should be one obvious way to do it" thing there, since most other languages I use take the "pass a comparison function to sort" approach. And I assume that is more efficient than allocating tuples to fake it. But DSU is an option, I suppose.

By cks at 2018-07-24 10:59:13:

I probably should have given an example of the bare print thing, but you have it right. I had a few bits of code where things went basically (in Python 2):

print "..."
for ...:
   print "more stuff"
...
print

When I went over the whole thing, I basically did an editor search for 'print ' (with the trailing space), which missed the last one. In Python 3, 'print ...' is a syntax error, but the last line by itself is just a more minimal version of '_ = print', which is silently not a syntax error.

As far as keys versus comparison functions goes, my understanding is that getting keys can be more efficient, especially if the key is just a field or element (which it often is), because you can then reuse the same key across multiple comparisons instead of starting from scratch each time. I believe that my sorting cases here are relatively unusual.

By Clément at 2018-07-24 13:37:26:

although there are still nuances that using sort keys can't easily capture.

That's incorrect: any comparison function can be converted into a sort key, and there's even a standard library module that makes it trivial. From the docs: Use functools.cmp_to_key() to convert an old-style cmp function to a key function.

By Twirrim at 2018-07-24 16:14:34:

I'm pretty much a fan of pylint. Catches / resolves so many things. That said, in tox.ini I've got it configured:

ignore = E501, E402, E722, E127, E128 # ignore line too long, imports not at top of file, ignore bare except, ignore 2x visual indentation complaints

That deals with about 95% of the annoying junk and just leaves the valuable stuff.

If you fancy a look, there's an interesting uncompromising code formatter for Python 3 called "Black" https://github.com/ambv/black. It aims to be the equivalent of go-fmt. It makes some odd choices from time to time, some of which enter the territory of "blocker" for me, but some of our coding needs that trigger them are somewhat unusual.

sapphirepaw:

  1. Key-based sorting is DSU – the key is what you would compute in the decoration phase – except that the wrapping and unwrapping is taken care of for you implicitly, without you having to write it out. Explicit DSU under these circumstances is just extra verbosity for no reason. So explicit DSU is obsolete in Python 3.

  2. Since DSU and key-based sorting are the same thing, DSU is not a solution to any problem anyone has with key-based sorting (or vice versa).

Clément:

If what you want is to know how to translate a comparator into a derivation of a value that naturally compares the same way as the comparator function would compare the original values, then cmp_to_key is less than useful.

The function it returns does not somehow derive such a value, it just puts each list element in a wrapper that attaches the comparator function to it. These values compare with each other like the comparator would compare the list elements… tautologically, because the comparator is still getting called every time to compare them.

So you pay the overhead for the implicit DSU in key-based sorting… just so you can waste it, and in fact you pay some extra just for the privilege of wasting it, in the form of overhead for reaching through the wrapper.

As the cmp_to_key docs say, it’s meant mostly as a transition aid for old code.

Written on 23 July 2018.
« The irritatingly many executable formats of Windows
I doubt Chrome's new 'not secure' warning about HTTP sites will change much (at least right away) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jul 23 23:56:44 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.