Wandering Thoughts archives

2008-12-07

How to help programmers (part 1): the os.listdir() problem

How to help (Unix) programmers: silently omit data under certain circumstances. From the Python 3000 release notes:

Note that when os.listdir() returns a list of strings, filenames that cannot be decoded properly are omitted rather than raising UnicodeError.

(Background: in Python 3k, os.listdir() is normally called with a directory name that is a str Unicode object.)

Yes, os.listdir() had problems, but this is not a solution; this is making the problems worse. Before, at least you found out if you had a problem in this area. Now you will get mysterious reports that your program doesn't process all of the files that are there, on some platforms.

What this suggests is that os.listdir() is actually not a portable interface. On Unix it fundamentally deals with with byte-strings, and attempts to paper over that cause explosions; on other platforms, in at least some circumstances, it fundamentally deals with Unicode strings, and you get the same explosions in the other direction. Hiding the explosions doesn't make them go away, it just makes the problem harder to diagnose.

(Of course, the problem is worse than just os.listdir(); all things that take or return filenames on Unix fundamentally deal with byte-strings, not Unicode strings.)

python/OsListdirProblem written at 18:46:50; Add Comment

One of Python's problems with packages

One of the problems with Python's current approach to CPAN-like packages is, to put it in a particularly blunt way, that it hasn't quite sunk in that not everyone has root.

(I am aware that Python doesn't have an approach to packages as such; it's all done by distutils et al, although distutils is part of the standard library. That's part of the overall problem.)

Oh, sure, you can use distutils to install packages in places besides the system locations. And then your Python programs won't automatically find them so that you can import them; if you want to change this, you have to do various things to your programs or your environment (or both). The net result is that personally installed Python packages are significantly less useful than system-level packages.

(Also, even if you have root level access, there is always the CPAN problem.)

That it is non-trivial to use canned packages is, I think, one reason that something like the CPAN culture does not seem to have really caught on in the Python community. My own experience was that I would rather write my own WSGI system (admittedly partly for the learning experience) than go through the hassle of trying to pull in the existing wsgiref module (in the days before it was part of the standard library).

(In theory you can just dump simple modules into your program's or package's directory and import them directly without having to do any extra work. In practice, I want my package directory to just contain my code, so that I can keep track of it all.)

As I hinted earlier, one of the core problems is that handling packages has been left to Python modules. A really useful package system really does require the cooperation and involvement of CPython itself in some way; without this, there's only so much that distutils or anything else can do. However, I have to admit that I don't really know what CPython should do to improve things. It has fairly good reasons for not automatically running user specifc code or searching user package directories every time it starts, and there are mechanisms for overriding this; it's just that they're inconvenient, and I think that the inconvenience is high enough that they create real friction.

python/PythonPackagingProblem written at 00:43:53; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.