How to help programmers (part 1): the os.listdir()
problem
How to help (Unix) programmers: silently omit data under certain circumstances. From the Python 3000 release notes:
Note that when
os.listdir()
returns a list of strings, filenames that cannot be decoded properly are omitted rather than raisingUnicodeError
.
(Background: in Python 3k, os.listdir()
is normally called with a
directory name that is a str
Unicode object.)
Yes, os.listdir()
had problems, but
this is not a solution; this is making the problems worse. Before, at
least you found out if you had a problem in this area. Now you will get
mysterious reports that your program doesn't process all of the files
that are there, on some platforms.
What this suggests is that os.listdir()
is actually not a portable
interface. On Unix it fundamentally deals with with byte-strings, and
attempts to paper over that cause explosions; on other platforms, in at
least some circumstances, it fundamentally deals with Unicode strings,
and you get the same explosions in the other direction. Hiding the
explosions doesn't make them go away, it just makes the problem harder
to diagnose.
(Of course, the problem is worse than just os.listdir()
; all
things that take or return filenames on Unix fundamentally deal with
byte-strings, not Unicode strings.)
|
|