Wandering Thoughts archives

2008-12-31

One of Python 3's fundamental problems on Unix

Here is a thesis that I have been mulling over lately:

One of Python 3's fundamental problems is that it is trying very hard to pretend that Unix is fundamentally a Unicode operating system, so that Python itself can be Unicode-based while still working on Unix. The problem with this is that it is demonstrably false, as seen in the os.listdir() problem; Unix is fundamentally a 'bytecode strings' environment, and attempts to pretend otherwise can run into problems any time that this pretense runs up against actual reality.

(The os.listdir() problem is actually the smallest one.)

I am not sure why Python 3 has decided to hide this messy reality from programmers (it certainly knows that it exists; the release notes are full of commentary). It seems very atypical for Python, which has always struck me as quite honest about this sort of thing. My only theory is that the Python 3 people felt that programmers would revolt if they were forced to work in Unicode and to still deal explicitly with such conversion issues, so they felt that a flawed pretense that usually worked was absolutely required.

(Possibly this issue has been discussed to death on the Python 3 development lists, but I don't follow Python news in much depth any more; I ran out of time.)

Sidebar: why I care

It is tempting to brush this under the carpet with various excuses, especially because most people will never see this; most people work on Unix systems where everything is correctly encoded in a single encoding. But this is an unstable situation and results in software that merely usually works, if everything goes well. As a system administrator, I am unhappy about software that can be broken by someone doing something peculiar; I want my software to be as resilient as possible.

Python3UnixProblem written at 01:05:23; Add Comment

2008-12-07

How to help programmers (part 1): the os.listdir() problem

How to help (Unix) programmers: silently omit data under certain circumstances. From the Python 3000 release notes:

Note that when os.listdir() returns a list of strings, filenames that cannot be decoded properly are omitted rather than raising UnicodeError.

(Background: in Python 3k, os.listdir() is normally called with a directory name that is a str Unicode object.)

Yes, os.listdir() had problems, but this is not a solution; this is making the problems worse. Before, at least you found out if you had a problem in this area. Now you will get mysterious reports that your program doesn't process all of the files that are there, on some platforms.

What this suggests is that os.listdir() is actually not a portable interface. On Unix it fundamentally deals with with byte-strings, and attempts to paper over that cause explosions; on other platforms, in at least some circumstances, it fundamentally deals with Unicode strings, and you get the same explosions in the other direction. Hiding the explosions doesn't make them go away, it just makes the problem harder to diagnose.

(Of course, the problem is worse than just os.listdir(); all things that take or return filenames on Unix fundamentally deal with byte-strings, not Unicode strings.)

OsListdirProblem written at 18:46:50; Add Comment

One of Python's problems with packages

One of the problems with Python's current approach to CPAN-like packages is, to put it in a particularly blunt way, that it hasn't quite sunk in that not everyone has root.

(I am aware that Python doesn't have an approach to packages as such; it's all done by distutils et al, although distutils is part of the standard library. That's part of the overall problem.)

Oh, sure, you can use distutils to install packages in places besides the system locations. And then your Python programs won't automatically find them so that you can import them; if you want to change this, you have to do various things to your programs or your environment (or both). The net result is that personally installed Python packages are significantly less useful than system-level packages.

(Also, even if you have root level access, there is always the CPAN problem.)

That it is non-trivial to use canned packages is, I think, one reason that something like the CPAN culture does not seem to have really caught on in the Python community. My own experience was that I would rather write my own WSGI system (admittedly partly for the learning experience) than go through the hassle of trying to pull in the existing wsgiref module (in the days before it was part of the standard library).

(In theory you can just dump simple modules into your program's or package's directory and import them directly without having to do any extra work. In practice, I want my package directory to just contain my code, so that I can keep track of it all.)

As I hinted earlier, one of the core problems is that handling packages has been left to Python modules. A really useful package system really does require the cooperation and involvement of CPython itself in some way; without this, there's only so much that distutils or anything else can do. However, I have to admit that I don't really know what CPython should do to improve things. It has fairly good reasons for not automatically running user specifc code or searching user package directories every time it starts, and there are mechanisms for overriding this; it's just that they're inconvenient, and I think that the inconvenience is high enough that they create real friction.

PythonPackagingProblem written at 00:43:53; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.