Some notes on lifting Python 2 code into Python 3 code

July 23, 2018

We have a set of Python programs that are the core of our ZFS spares handling system. The production versions are written in Python 2 and run on OmniOS on our ZFS fileservers, but we're moving to ZFS-based Linux fileservers, so this code needed a tune-up to cope with the change in environment. As part of our decision to use Python 3 for future tools, I decided to change this code over to Python 3 (partly because I needed to write some completely new Python code to handle Linux device names).

This is not a rewrite or even a port; instead, let's call it lifting code from Python 2 up to Python 3. Mechanically what I did is similar to the first time I did this sort of shift, which is that I changed the '#!/usr/bin/python' at the start of the programs to '#!/usr/bin/python3' and then worked to fix everything that Python 3 complained about. For this code, there have only been a few significant things so far:

  • changing all tabs to spaces, which I did with expand (and I think I overdid it, since I didn't use 'expand -i').

  • changing print statements into print() calls. I learned the hard way to not overlook bare 'print' statements; in Python 2 that produces a newline, while in Python 3 it's still valid but does nothing.

  • converting 'except CLS, VAR:' statements to the modern form, as this code was old enough to have a number of my old Python 2 code habits.

  • taking .sort()s that used comparison functions and figuring out how to creatively generate sort keys that gave the same results. This opened my mind up a bit, although there are still nuances that using sort keys can't easily capture.

  • immediately list()-ifying most calls of adict.keys(), because that particular assumption was all over my code. There were a couple of cases that perhaps I could have deferred the list-ification to later (if at all), but this 'lifting' is intended to be brute force.

    (I didn't list-ify cases where I was clearly immediately iterating, such as 'for ... in d.keys()' or 'avar = [x for ... in d.keys()]'. But any time I assigned .keys() to a name or returned it, it got list-ified.)

  • replace use of optparse with argparse. This wasn't strictly necessary (Python 3 still has optparse), but argparse is the future so I figured I'd fix things while I was working on the code anyway.

Although these tools do have a certain amount of IO, I could get away with relying on Python 3's default character set conversion rules; in practice they should only ever be dealing with ASCII input and output, and if they aren't something has probably gone terribly wrong (eg our ZFS status reporting program has decided to start spraying out binary garbage). This is fairly typical of internal-use system tools but not necessarily of other things, which can expose interesting character set conversion questions.

(My somewhat uninformed view is that character set conversion issues are where moving from Python 2 to Python 3 gets exciting. If you can mostly ignore them, as I could here, you have a much easier time. If you have to consider them, it's probably going to be more porting than just casually lifting the code into Python 3.)

For the most part this 2-to-3 lifting went well and was straightforward. It would have gone better if I had meaningful tests for this code, but I've always had problems writing tests for command line programs (and some of this code is unusually complex to test). I used pyflakes to try to help find Python 3 issues that I'd overlooked; it found some issues but not all of them, and it at least feels less thorough than pychecker used to be. What I would really like is something that's designed to look for lingering Python 2-isms that either definitely don't work in Python 3 or that might be signs of problems, but I suspect that no such tool exists.

(I tried pylint very briefly, but stopped when it had an explosion of gripes with no obvious way to turn off most of them. I don't care about style 'issues' in this code; I want to know about actual problems.)

I'm a bit concerned that there are lingering problems in the code, but this is basically the tradeoff I get to make for taking the approach of 'lifting' instead of 'porting'. Lifting is less work if everything is straightforward and goes well, but it's not as thorough as carefully reading through everything and porting it piece by carefully considered piece (or using tests on everything). I had to stumble over a few .sort()s with comparison functions and un-listified .keys(), especially early on, which has made me conscious that there could be other 2-to-3 issues I just haven't hit in my test usage of the programs. That's one reason I'd like a scanner; it would know what to look for (probably better than I do right now) and as a program, it would look in all of the code's corners.

PS: I remember having a so-so experience with 2to3 many years in the past, but writing this entry got me to see what it did to the Python 2 versions. For the most part it was an okay starting point, but it didn't even flag uses of .sort() with a comparison function and it did significant overkill on list-ifying adict.keys(). Still, reading its proposed diffs just now was interesting. Probably not interesting enough to get me to use it in the future, though.

Written on 23 July 2018.
« The irritatingly many executable formats of Windows
I doubt Chrome's new 'not secure' warning about HTTP sites will change much (at least right away) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jul 23 23:56:44 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.