One of Python 3's fundamental problems on Unix

December 31, 2008

Here is a thesis that I have been mulling over lately:

One of Python 3's fundamental problems is that it is trying very hard to pretend that Unix is fundamentally a Unicode operating system, so that Python itself can be Unicode-based while still working on Unix. The problem with this is that it is demonstrably false, as seen in the os.listdir() problem; Unix is fundamentally a 'bytecode strings' environment, and attempts to pretend otherwise can run into problems any time that this pretense runs up against actual reality.

(The os.listdir() problem is actually the smallest one.)

I am not sure why Python 3 has decided to hide this messy reality from programmers (it certainly knows that it exists; the release notes are full of commentary). It seems very atypical for Python, which has always struck me as quite honest about this sort of thing. My only theory is that the Python 3 people felt that programmers would revolt if they were forced to work in Unicode and to still deal explicitly with such conversion issues, so they felt that a flawed pretense that usually worked was absolutely required.

(Possibly this issue has been discussed to death on the Python 3 development lists, but I don't follow Python news in much depth any more; I ran out of time.)

Sidebar: why I care

It is tempting to brush this under the carpet with various excuses, especially because most people will never see this; most people work on Unix systems where everything is correctly encoded in a single encoding. But this is an unstable situation and results in software that merely usually works, if everything goes well. As a system administrator, I am unhappy about software that can be broken by someone doing something peculiar; I want my software to be as resilient as possible.


Comments on this page:

From 87.194.212.65 at 2009-01-01 13:18:09:

Well, there is also a bytes API for listdir.

By cks at 2009-01-02 01:20:49:

os.listdir() is the easy case since as you mention it does have an alternative (although I believe that there are problems with that for portable code, per OsListdirProblem). There are other areas where Python talks to the Unix world that have worse issues.

By cks at 2009-01-03 00:29:23:

Since I've now written up something on it, see ArgvEnvironProblem for the Unicode problems that Python 3 has with os.environ and sys.argv, which are worse than the ones it has with _os.listdir().

Written on 31 December 2008.
« ZFS and crazy dates
Certificate authorities seem to be a real weakness in SSL »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Wed Dec 31 01:05:23 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.