How to help programmers (part 1): the os.listdir() problem

December 7, 2008

How to help (Unix) programmers: silently omit data under certain circumstances. From the Python 3000 release notes:

Note that when os.listdir() returns a list of strings, filenames that cannot be decoded properly are omitted rather than raising UnicodeError.

(Background: in Python 3k, os.listdir() is normally called with a directory name that is a str Unicode object.)

Yes, os.listdir() had problems, but this is not a solution; this is making the problems worse. Before, at least you found out if you had a problem in this area. Now you will get mysterious reports that your program doesn't process all of the files that are there, on some platforms.

What this suggests is that os.listdir() is actually not a portable interface. On Unix it fundamentally deals with with byte-strings, and attempts to paper over that cause explosions; on other platforms, in at least some circumstances, it fundamentally deals with Unicode strings, and you get the same explosions in the other direction. Hiding the explosions doesn't make them go away, it just makes the problem harder to diagnose.

(Of course, the problem is worse than just os.listdir(); all things that take or return filenames on Unix fundamentally deal with byte-strings, not Unicode strings.)

Written on 07 December 2008.
« One of Python's problems with packages
How I split up my workstation's disk space »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Dec 7 18:46:50 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.