Wandering Thoughts archives

2014-08-23

Some notes on Python packaging stuff that wasn't obvious to me

A comment by Lars Kellogg-Stedman on this entry of mine wound up with me wanting to try out his lvcache utility, which is a Python program that's packaged with a setup.py. Great, I thought, I know how to install these things.

Well, no, not any more. While I wasn't looking, Python packaging systems have gotten absurdly complex and annoying (and yes, one of the problems is that there are more than one of them). My attempts to install lvcache (either privately or eventually system-wide in a sacrificial virtual machine) failed in various ways. In the process they left me very frustrated because I had very little understanding of what a modern Python setup does when. Since I now have somewhat more understanding I'm going to write up what I know.

Once upon a time there was just site-packages with .py files and plain directories in it, and life was simple and good. If you wanted to you could augment the standard site-packages by setting $PYTHONPATH; the additional directories would be searched for .py files and plain directories too. Modern Python has added some wrinkles:

  • .pth files list additional paths that will be used for importing things from (generally relative to the directory you find them in). These additional import paths are visible in sys.path, so if you're not sure if a .pth file is working you can start Python and check what sys.path reports.

    .pth files in standard locations are loaded automatically; this includes your personal 'user' directory (on Unix, generally $HOME/.local/lib/pythonX.Y/site-packages, ie what 'python setup.py install --user' et al will use). However, .pth files in directories that are merely on your $PYTHONPATH are not automatically loaded by Python and must be bootstrapped somehow; if you use easy_install --prefix, it will stick a site.py file to do this in the directory.

    (There are some really weird things that go on with .pth files. See Armin Ronacher.)

  • .egg files are ZIP files, which Python can import code from directly. They contain metadata and a module directory with .py files and normally appear directly on sys.path (eg the .egg file is listed itself). You can inspect .egg file contents with 'unzip -v thing.egg'. Under some circumstances it's possible for the install process to build a .egg that doesn't contain any Python code (or contains incomplete Python code); if you're facing mysterious failures, you may need to check for this.

  • .egg directories are unpacked versions of the ZIP versions above. I don't know when easy_install et al create directories versus files. Like the files they appear on sys.path directly. They can be inspected directly.

Modern installers no longer just put files and module directories in places. Instead, they make or obtain eggs and install the eggs. The good news is that things like easy_install follow dependencies (assuming that everyone has properly specified them, not always a given). The bad news is that this is much less inspectable than the old days.

(Okay, the other good news is that you can see which version of what you've installed by hand, instead of having a mess of stuff.)

In a properly functionally installed environment you should be able to fire up an interactive Python session and do 'import <module>' for every theoretically installed module. If this fails, either any .pth files are not getting bootstrapped (which can be checked by looking at sys.path), you don't have a module installed that you think you should, or perhaps the module is empty or damaged.

I'm sure all of this is documented in one or more places in the official Python documentation, but it is sure not easy to find if it is (and I really don't think there's one place that puts it all together).

PS: if you're installing a local copy of a package's source you want 'easy_install .' (in the source directory), likely with --user or --prefix. At least some of the time, easy_install will insist that you precreate the --prefix directory for it; it will always insist that you add it to $PYTHONPATH.

(The current anarchy of Python packaging and install systems requires another rant but I am too exhausted for it right now.)

PythonPackagingNotes written at 00:32:11; Add Comment

2014-08-18

An example of a subtle over-broad try in Python

Today I wrote some code to winnow a list of users to 'real' users with live home directories that looks roughly like the following:

for uname, hdir in userlist:
   try:
      st = os.stat(hdir)
      if not stat.S_ISDIR(st.st_mode) or \
         stat.S_IMODE(st.st_mode) == 0:
            continue
      # looks good:
      print uname
   except EnvironmentError:
      # accept missing homedir; might be a
      # temporarily missing NFS mount, we
      # can't tell.
      print uname

This code has a relatively subtle flaw because I've accidentally written an over-broad exception catcher here.

As suggested by the comment, when I wrote this code I intended the try block to catch the case where the os.stat failed. The flaw here is that print itself does IO (of course) and so can raise an IO exception. Since I have the print inside my try block, a print-raised IO exception will get caught by it too. You might think that this is harmless because the except will re-do the print and thus presumably immediately have the exception raised again. This contains two assumptions: that the exception will be raised again and that if it isn't, the output is in a good state (as opposed to, say, having written only partial output before an error happened). Neither are entirely sure things and anyways, we shouldn't be relying on this sort of thing when it's really easy to fix. Since both branches of the exception end up at the same print, all we have to do is move it outside the try: block entirely (the except case then becomes just 'pass').

(My view is that print failing is unusual enough that I'm willing to have the program die with a stack backtrace, partly because this is an internal tool. If that's not okay you'd need to put the print in its own try block and then do something if it failed, or have an overall try block around the entire operation to catch otherwise unexpected EnvironmentError exceptions.)

The root cause here is that I wasn't thinking of print as something that does IO that can throw exceptions. Basic printing is sufficiently magical that it feels different and more ordinary, so it's easy to forget that this is a possibility. It's especially easy to overlook because it's extremely uncommon for print to fail in most situations (although there are exceptions, especially in Python 3). You can also attribute this to a failure to minimize what's done inside try blocks to only things that absolutely have to be there, as opposed to things that are just kind of convenient for the flow of code.

As a side note, one of the things that led to this particular case is that I changed my mind about what should happen when the os.stat() failed because I realized that failure might have legitimate causes instead of being a sign of significant problems with an account that should cause it to be skipped. When I changed my mind I just did a quick change to what the except block did instead of totally revising the overall code, partly because this is a small quick program instead of a big system.

SubtleBroadTry written at 22:34:55; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.