Python 3's core compatibility problem and decision

January 2, 2014

In light of the still ongoing issues with Python 3, a very interesting question is what makes moving code to Python 3 so difficult. After all, Python has made transitions before, even transitions with little or no backwards compatibility, and given that there is very little real difference between Python 2 and 3 you would normally expect that people would have no real problems shifting code across to Python 3. After all, things like changing from using print to using print() are not really a big deal.

In my view almost all of the Python 3 issues come down to one decision (or really one aspect of that decision): making all strings Unicode strings, with no real backwards path to working with bytes. Working with Unicode strings instead of byte blobs is a fundamental change to the structure of most programs. Moving many programs to Python 3 requires changing them to be programs that fundamentally work in Unicode and this itself can require a whole host of changes throughout the program's data structures and interfaces, as well as require you to explicitly consider failure points that you didn't need to before. It is this design rethink that is the hard part about moving code, not the mechanical parts of eg changing print to print().

I think that this is also why there is a significant gulf between different people's experiences of working with Python 3. Some people have already been writing code that worked internally in Unicode, even in Python 2. This code is easy to move to Python 3 because it has already dealt with the major issue of that conversion; all that's left is more or less mechanical issues like print versus print(). Other people, with me as an example, have a great deal of code that is really character encoding indifferent (well, more or less) and as a result need to redesign its core for Python 3.

(I think that this also contributes to a certain amount of arrogance on the side of Python 3 boosters, in that they may feel that anyone who was 'sloppily' not working in Unicode already was doing it wrong and so is simply paying the price for their earlier bad habits. To put it one way, this is missing the real problem as usual, never mind the practical issues involved on Unix.)

Honesty requires me to admit that this is a somewhat theoretical view in that I haven't attempted to convert any of my code to Python 3. This is however the aspect of the conversion that I'm most worried about and that strikes me as what would cause me the most problems.

Sidebar: a major contributing factor

I feel that one major factor that makes the Unicode decision a bigger issue than it would otherwise be is that Python 3 doesn't just make literal strings into Unicode, it makes a lot of routines that previously returned bytestrings instead return Unicode strings. Many of these routines are naturally dealing with byte blobs and forced conversion to Unicode thus creates new encoding-related failure points. It is this decision that forces design changes on programs that would otherwise be indifferent to the issues involved because they only shove the byte blobs involved around without looking inside them.

My impression is that Python 3 has gotten somewhat better about this since its initial release in that many more things are willing to work with bytes or can be coerced to return bytes if you find the right magic options. This still leaves you to go through your code to find all of the places that this is needed (and to hope you don't miss a rarely executed code path), and sometimes revising code to account for what the new options really mean.

(For example, you now get bytes out of files by opening them in "rb" mode. However this mode has potentially important behavior differences from the Python 2 plain "r" mode; it does no newline conversion and is buffered differently on ttys.)

Written on 02 January 2014.
« Two uses of fmt
What determines Python 2's remaining lifetime? »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jan 2 01:43:35 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.