2011-12-30
Why I don't like Python 3 dropping the comparison function for sorting
One of the changes that Python 3 has made is that, to quote the documentation:
builtin.sorted() and list.sort() no longer accept the cmp argument providing a comparison function. Use the key argument instead. [...]
I feel unreasonably annoyed about this change. At least on the surface there's no obvious reason why; basically all of the uses of a comparison function I've ever used are to pick a specific field out, and that's handled much better by the key argument. However, I've recently figured out what irritates me about this: it couples data and behavior too closely.
In the new world, there are three ways to create a sort ordering. If
your ordering depends on explicit fields (possibly modified), you can
use a straightforward key function. If the ordering of a data element
is strictly computable from a single element (for example, a 'distance'
metric that's easy to determine), you can use a key function which
synthetically computes an element's ordering and returns it. And if
neither of these holds and you can only really determine a relative
ordering, you can define a __lt__ method on your objects.
The problem with the last approach is that, of course, you can only have
one __lt__ method and thus only one sort ordering. What's happened
is that you've been forced to couple the raw data with the behavior of
a particular sort ordering. Getting around this requires various hacks,
such as synthetic wrapper objects with different __lt__ functions.
(The other problem is that your data needs to be actual objects. While this is usually the case for anything complex enough that you only can do a relative ordering, sometimes you're getting the data from an outside source and it would be handy to leave it in its native form.)
While this is only a theoretical concern for me, it still irritates me a bit that Python 3 has chosen to move towards closer, less flexible coupling between data and ordering. I maintain that the two are separate and we can see this in the fact that there are many possible orderings for complex data depending on what you want to do with it.
By the way, I can see several reasons why Python 3 did this and I sympathize with them (even if I still don't like dropping cmp). The Python 3 documentation notes that key is more efficient since it's called only once per object you're sorting. On top of that, it's relatively easy to make mistakes with complex cmp functions that create inconsistent ordering, which potentially causes sorting algorithms to malfunction mysteriously.
2011-12-27
Python 3 from the perspective of someone writing new Python code
I've talked about Python 3 from the perspective of a Unix sysadmin and Python 3 from the perspective of someone with existing Python 3 code; now it's time for the final viewpoint, that of someone writing new code.
There are a bunch of practical difficulties with this, things like having Python 3 installed on machines and third party modules being ported to Python 3, but they're either gone or going away (and most of what I write doesn't depend on third party modules). Ignoring those issues as ultimately unimportant, I don't think there's any reason not to write new, non-sysadmin code in Python 3. It's clearly the future of Python and although I may grump about some decisions, there's a fair amount to like about it. Yes it's different but much of that difference is good.
(I've made a vaguely similar transition in Python programming before, when I moved from 1.x to 2.x. It was a more backwards compatible change and I felt it was less wrenching, but it had just the same sort of generally neat new things in the new version. Today, for example, if I write an old-style class it's by accident.)
I have to admit that this is a theoretical view right now, because I haven't tried to write anything new in Python 3 yet. Most of what I've written recently is sysadmin tools and those need to be in Python 2 for the foreseeable future. But the next time I come up with a Python program to write I'm going to keep this in mind and try to write it in Python 3 instead of Python 2, no matter what my inertia is saying.
(A good step would be to make sure that as many of our machines as possible actually have Python 3 installed. Now that I look, some of them don't have it installed by default, which isn't going to help Python 3's adoption any.)
PS: the one Python 3 change that's going to be irritating me for years is the whole Unicode-ification of everything in sight. This deserves a longer discussion than fits within the margins of this entry and besides, this entry is a positive one. Also, I suspect that once I start actually using Python 3, the Unicode stuff will prove to be less of a pain than I currently expect it to be.
2011-12-21
Python 3 from the perspective of someone with existing Python code
Last time, I talked about Python 3 from the perspective of a Unix sysadmin. Today I want to talk about Python 3 from the perspective of someone who has a not insignificant amount of current Python code. I don't have huge (by Python standards) programs, but I do have various things (not all large) currently running live, for real, doing things that I care about.
Recently I read Armin Ronacher's Thoughts on Python 3, where he wrote (among other things):
Because as it stands, Python 3 is the XHTML of the programming language world. It's incompatible to what it tries to replace but does not offer much besides being more "correct".
I'm kind of sad to say this, but what he said (down to the comparison with XHTML).
Some of my code has a decent amount of tests but not all of it, and all of it currently works. Migrating it to Python 3 requires a significant amount of effort and testing, even for the code that has tests, and in exchange I get basically nothing except a warm fuzzy feeling that I am 'modern'. It would be pure make-work. Worse, it would be make-work that runs a good risk of destabilizing working code.
There are two aspects to the problem. The first is simply that Python 3 is a big change from Python 2. I'm willing to make small or moderate changes purely for compatibility purposes, but I've certainly been left with the impression that Python 3 requires some significant changes (even if a number of them will work in Python 2.7, the issue is the amount of changes to the current code). The second is that Python 3's handling of strings and Unicode demand an architectural change in code that is currently ignoring the issue and just shoving around plain byte strings, which describes all of my current code. Part of this is just switching to Unicode by itself, but part of it is that since conversions to and from Unicode can fail I now need to find all of these places and figure out what I want to do.
(This also increases the risk of the changes. If I miss a place where a conversion can fail, my code may blow up at some point in the future with uncaught exceptions in a situation where it works today. This is not really an attractive selling point and yes, I would rather have mojibake than explosive failures. Among other reasons, to a first order approximation mojibake is caused by someone else's mistake while uncaught exceptions are clearly my fault.)
The result is that I can't possibly justify migrating any significant amount of my current code to Python 3 (either to myself or to others). It will remain Python 2 code unless and until I have no choice, and if I stop having a choice I'm going to fiercely resent it.
(This is entirely apart from any pragmatic issues such as dependencies that haven't yet been ported to Python 3. Most of my code doesn't use third-party modules or code anyways, just standard library stuff.)
2011-12-17
Python 3 from the perspective of a Unix sysadmin
I've been thinking about Python 3 for a while, mulling over things like how I feel about it and how likely I am to use it, and I've decided that one reason my feelings are complex is that I have three different views of it, from three different perspectives. Today is the day for the first perspective: Python 3 from the perspective of a Unix sysadmin who uses Python to program important parts of our systems.
I don't have any way to put this nicely, so I'll say it right up front: for a Unix sysadmin, Python 3 is currently highly radioactive and should be completely avoided. Our current systems are written in Python 2; there is no prospect of this changing and I am going to keep writing sysadmin things in Python 2 for the indefinite future. I will stop this only when the systems we use stop packaging Python 2, and I certainly hope that that doesn't happen for, oh, a decade or more.
The fundamental problem is that Python 3 wants the operating system
environment to be Unicode, and Unix is not. When Python 3 comes into
contact with messy reality, bad things happen
and things fail. These failures are vaguely tolerable for ordinary
user programs; they are intolerable for programs used for system
management. I cannot afford to write programs that silently omit names
from os.listdir()'s results, that don't see some environment variables
sometimes, or that die with mysterious error messages if given the wrong
arguments. There are workarounds for some of these issues (but none yet
for the sys.argv issue), but they are limited
in scope and unlikely to be pervasive (in, eg, third party modules that
I want to use).
So long as Python 3 is busy denying Unix reality (and causing all sorts of complications as a result of this), the sysadmin side of me can't and isn't going to touch it. I doubt that the Python 3 developers care about this and I doubt that anything is going to change in Python 3, which is kind of a pity.
(I could probably write system tools in Python 3 if I wanted to and tried hard enough and had to, but I don't see any reason to do so given that Python 2 is there and going to be there for a long time to come. Python 2 works, it works without huge contortions, and I don't really see anything compelling in Python 3 so far.)
Sidebar: on the long term availability of Python 2
At this point in time I see essentially no prospect of Python 2 being removed from Linux distributions in the next five years (minimum). The very first step along the long path of removing Python 2 would be for distributions to migrate Python based system tools from Python 2 to Python 3, and that hasn't even started yet (distributions are just now starting to talk about maybe moving some of their Python-based tools to Python 3 for their next release).
The chances of Python 2 disappearing any time soon from more conservative and slow moving Unixes like FreeBSD and Solaris (and Mac OS X) are best described as 'laughable'.
2011-12-13
DWiki's code is now on Github (among other things)
As a followup to my first experiment with coding in public, I've put a few other Python projects up on Github. They are:
- dwiki, the code for DWiki
itself (the software that runs this blog), plus the basic page
templates and so on that I use. I'm not entirely happy with the
actual organization of the code, but I have no energy to reform
it at this point (or, more likely, rewrite it from scratch).
(At the moment the specific additional templates for WanderingThoughts are not bundled in.)
- portnanny is a powerful
inetd-like frontend for a single TCP service, with a great deal of
filtering power. It's also the Python code that I'm probably most
proud of, since I think I did a decent job of structuring it and
writing tests.
(The quality of its code may be related to the fact that it was a total rewrite of an earlier attempt.)
- python-netblock is a Python module for dealing with sets of IP address ranges; as part of this it has a module for sets of integer ranges in general. It comes with a command line netblock calculator that I use all the time (although there's no manpage for it right now).
I've made an index page for all of my Github things that I intend to keep up to date, or you can of course just look at things on Github.