How to help programmers (parts 2 and 3): os.environ and sys.argv

January 3, 2009

As it happens, the os.listdir() problem is just the tip of the iceberg of Python 3's Unix problems. Here are two more ways that it helps Unix programmers, from the release notes:

Some system APIs like os.environ and sys.argv can also present problems when the bytes made available by the system is not interpretable using the default encoding. Setting the LANG variable and rerunning the program is probably the best approach.

Since the release notes are not explicit, let me fill them in with what happens in each case.

If you have environment variables with un-decodable contents, Python 3 will pretend that they don't exist (and in fact they don't as far as it is concerned; they never made it into the os.environ data structure). This is worse than the os.listdir() case, because there is no way to work around it in your Python program; the behavior is hard-coded into the C source of the posix module. The only good news is that Python 3 doesn't remove these environment variables from the environment it passes to programs it executes via things like os.system() and os.popen().

For sys.argv, any un-decodable command line arguments (such as oddly encoded filenames) cause your Python program to abort with a message like 'Could not convert argument 2 to string'. This happens whether or not you ever import the sys module, as it is hard coded very early on in CPython's startup. For bonus points, the error message makes no attempt to identify what is producing it (it doesn't even mention that it is being produced by Python 3).

(System administrators and anyone else who deals with complex, multi-layered systems have a special sort of affection for unidentified error messages.)

As Ian Bicking noted in the comments on the os.listdir() problem, the real solution here is alternate bytes-based interfaces to both os.environ and sys.argv that (at least on Unix) would be the 'real' versions. But that would require Python 3 admitting that Unix is not all Unicode, which seems unlikely right now.

Written on 03 January 2009.
« Why SSL needs certificate authorities, or at least trust roots
'Email marketing' versus outright email spam »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Jan 3 00:24:02 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.