Link: Armin Ronacher's 'More About Unicode in Python 2 and 3'

January 16, 2014

Armin Ronacher's More About Unicode in Python 2 and 3 contains a lot of information about the subject from someone who works with this stuff and so is much better informed about it in practice than I am. A sample quote:

I will use this post to show that from the pure design of the language and standard library why Python 2 the better language for dealing with text and bytes.

Since I have to maintain lots of code that deals exactly with the path between Unicode and bytes this regression from 2 to 3 has caused me lots of grief. Especially when I see slides by core Python maintainers about how I should trust them that 3.3 is better than 2.7 makes me more than angry.

I learned at least two surprising things from reading this. The first was that I hadn't previously realized that string formatting is not available for bytes in Python 3, only for Unicode strings. The second is that Mercurial has not and is not being ported to Python 3. As Ronacher notes, it turns out that these two issues are not unrelated.

For me, the lack of formatting for bytes adds another reason for not using Python 3 even for new code because it forces me into more Unicode conversion even if I know exactly what I'm doing with those unconverted bytes. Since I use Unix, with its large collection of non-Unicode byte APIs, there are times when this matters.

(For instance, it is perfectly sensible to manipulate Unix file paths as bytes without trying to convert them to Unicode. You can split them into path components, add prefixes and suffixes, and so on all without having to interpret the character sets of the file name components. In fact, in degenerate situations the file name components may be in different character sets, with a directory name in UTF-8 and file name inside a subdirectory in something else. At that point there is no way to properly decode the entire file path to meaningful Unicode. But I digress from Armin Ronacher's article.)

Written on 16 January 2014.
« Debian does not have long term support
Your web application should have an audit log »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jan 16 19:35:04 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.