One of Python 3's fundamental problems on Unix
Here is a thesis that I have been mulling over lately:
One of Python 3's fundamental problems is that it is trying very hard
to pretend that Unix is fundamentally a Unicode operating system, so
that Python itself can be Unicode-based while still working on Unix. The
problem with this is that it is demonstrably false, as seen in the
os.listdir() problem; Unix is fundamentally a
'bytecode strings' environment, and attempts to pretend otherwise can
run into problems any time that this pretense runs up against actual
os.listdir() problem is actually the smallest one.)
I am not sure why Python 3 has decided to hide this messy reality from programmers (it certainly knows that it exists; the release notes are full of commentary). It seems very atypical for Python, which has always struck me as quite honest about this sort of thing. My only theory is that the Python 3 people felt that programmers would revolt if they were forced to work in Unicode and to still deal explicitly with such conversion issues, so they felt that a flawed pretense that usually worked was absolutely required.
(Possibly this issue has been discussed to death on the Python 3 development lists, but I don't follow Python news in much depth any more; I ran out of time.)
Sidebar: why I care
It is tempting to brush this under the carpet with various excuses, especially because most people will never see this; most people work on Unix systems where everything is correctly encoded in a single encoding. But this is an unstable situation and results in software that merely usually works, if everything goes well. As a system administrator, I am unhappy about software that can be broken by someone doing something peculiar; I want my software to be as resilient as possible.