One of Python 3's fundamental problems on Unix

December 31, 2008

Here is a thesis that I have been mulling over lately:

One of Python 3's fundamental problems is that it is trying very hard to pretend that Unix is fundamentally a Unicode operating system, so that Python itself can be Unicode-based while still working on Unix. The problem with this is that it is demonstrably false, as seen in the os.listdir() problem; Unix is fundamentally a 'bytecode strings' environment, and attempts to pretend otherwise can run into problems any time that this pretense runs up against actual reality.

(The os.listdir() problem is actually the smallest one.)

I am not sure why Python 3 has decided to hide this messy reality from programmers (it certainly knows that it exists; the release notes are full of commentary). It seems very atypical for Python, which has always struck me as quite honest about this sort of thing. My only theory is that the Python 3 people felt that programmers would revolt if they were forced to work in Unicode and to still deal explicitly with such conversion issues, so they felt that a flawed pretense that usually worked was absolutely required.

(Possibly this issue has been discussed to death on the Python 3 development lists, but I don't follow Python news in much depth any more; I ran out of time.)

Sidebar: why I care

It is tempting to brush this under the carpet with various excuses, especially because most people will never see this; most people work on Unix systems where everything is correctly encoded in a single encoding. But this is an unstable situation and results in software that merely usually works, if everything goes well. As a system administrator, I am unhappy about software that can be broken by someone doing something peculiar; I want my software to be as resilient as possible.


Written on 31 December 2008.
« ZFS and crazy dates
Certificate authorities seem to be a real weakness in SSL »

These are my WanderingThoughts
(About the blog)

Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.
Twitter: @thatcks

* * *

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

This is a DWiki.
GettingAround
(Help)

Search:

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Dec 31 01:05:23 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.