Python 3's core compatibility problem and decision
In light of the still ongoing issues with Python 3,
a very interesting question is what makes moving code to Python 3
so difficult. After all, Python has made transitions before, even
transitions with little or no backwards compatibility, and given
that there is very little real difference between Python 2 and 3 you would normally expect that people would have
no real problems shifting code across to Python 3. After all, things
like changing from using print
to using print()
are not really a
big deal.
In my view almost all of the Python 3 issues come down to one decision
(or really one aspect of that decision): making all strings Unicode
strings, with no real backwards path to working with bytes. Working
with Unicode strings instead of byte blobs is a fundamental change
to the structure of most programs. Moving many
programs to Python 3 requires changing them to be programs that
fundamentally work in Unicode and this itself can require a whole host
of changes throughout the program's data structures and interfaces, as
well as require you to explicitly consider failure points that you
didn't need to before. It is this design rethink
that is the hard part about moving code, not the mechanical parts of eg
changing print
to print()
.
I think that this is also why there is a significant gulf between
different people's experiences of working with Python 3. Some people
have already been writing code that worked internally in Unicode,
even in Python 2. This code is easy to move to Python 3 because it
has already dealt with the major issue of that conversion; all
that's left is more or less mechanical issues like print
versus
print()
. Other people, with me as an example, have a great deal
of code that is really character encoding indifferent (well, more or less) and
as a result need to redesign its core for Python 3.
(I think that this also contributes to a certain amount of arrogance on the side of Python 3 boosters, in that they may feel that anyone who was 'sloppily' not working in Unicode already was doing it wrong and so is simply paying the price for their earlier bad habits. To put it one way, this is missing the real problem as usual, never mind the practical issues involved on Unix.)
Honesty requires me to admit that this is a somewhat theoretical view in that I haven't attempted to convert any of my code to Python 3. This is however the aspect of the conversion that I'm most worried about and that strikes me as what would cause me the most problems.
Sidebar: a major contributing factor
I feel that one major factor that makes the Unicode decision a bigger issue than it would otherwise be is that Python 3 doesn't just make literal strings into Unicode, it makes a lot of routines that previously returned bytestrings instead return Unicode strings. Many of these routines are naturally dealing with byte blobs and forced conversion to Unicode thus creates new encoding-related failure points. It is this decision that forces design changes on programs that would otherwise be indifferent to the issues involved because they only shove the byte blobs involved around without looking inside them.
My impression is that Python 3 has gotten somewhat better about this since its initial release in that many more things are willing to work with bytes or can be coerced to return bytes if you find the right magic options. This still leaves you to go through your code to find all of the places that this is needed (and to hope you don't miss a rarely executed code path), and sometimes revising code to account for what the new options really mean.
(For example, you now get bytes out of files by opening them in
"rb"
mode. However this mode has potentially important behavior
differences from the Python 2 plain "r"
mode; it does no newline
conversion and is buffered differently on ttys.)
|
|