Wandering Thoughts archives

2010-11-20

More on those Python import oddities

From the reddit discussion and also the comments of my previous entry on import, I learned that people both do 'import os.path' and then use os.whatever without explicitly importing os, and do 'import os' and then just use os.path.whatever without explicitly importing os.path.

First off, I don't think that you should do either of these even if (and when) they work, for os or any other module. This is a stylistic thing, but when doing imports I prefer to be explicit about what other modules my code uses. Also, I don't like clever tricks like this because they run a high risk of confusing people who read my code, and this includes me in the future if I've forgotten this bit of arcane trivia by then.

Both of these clearly work for os. But are they guaranteed to work in general? The answer is half yes and half no.

As sort of discussed last time, 'import x.y; x.whatever' is guaranteed to work because the semantics of useful multi-level imports require it. 'import x.y' is pointless if you cannot resolve x.y.whatever afterwards, and in order to do that you must have 'x' in your namespace after the dust settles. So Python gives it to you.

A more interesting question is whether 'import x; x.y.whatever' is guaranteed to work. The short answer is no, although I suspect that it often will for relatively small modules. First off, modules that are implemented as single Python files (as with os) have to make this work; as discussed last time, x.y must be defined after x.py has finished executing as part of the import process, because there is no other way for the interpreter to find the x.y module.

For modules that are implemented as directories (with submodules as either files or subdirectories) there is no requirement that the module's __init__.py import the y submodule for you. The tradeoff is that importing submodules automatically makes 'from x import *' work as people expect, at the cost of loading potentially large submodules that people are not going to use; the larger and less used your set of submodules is, the more this matters. So you can sensibly have a module that requires explicit imports of submodules, and indeed there are modules in the Python standard library that work this way (xml is one example).

Now we come to a piece of import trivia: importing a submodule will actually modify the parent module's namespace. If you do import x.y (in any variant) and y is not defined in x's namespace, Python adds it for you. Once I thought about it, I realized that it had to work this way if Python wanted to support submodules that had to be loaded by hand, but I find it vaguely interesting that Python is willing to drop things in another module's namespace for you as a result of stuff that you do.

(This happens even if you do not get an explicit reference to x, eg if you do 'import x.y as t'.)

ImportOdditiesII written at 01:39:33; Add Comment

2010-11-13

How os.path exposes some Python import weirdness

For reasons beyond the scope of this entry, I was recently looking at the source code for the os module when a question suddenly struck me: how on earth does 'import os.path' work?

The conventional way to have submodules like this is to have your module be a directory with an __init__.py file, and then the submodule is either a Python file or a subdirectory. However, the os module is not a directory; instead it is a single file, os.py, with no os/path.py for Python to import in any conventional way. So what's going on?

First off, recall that importing a module loads and executes that module's code (or in the case of directory based modules, the code in its __init__.py file). As it turns out, when you do 'import os.path' the os module itself is loaded and executed even though it is a plain file and not a subdirectory, and is also imported into your module. Once you think about it, it's clear that the os module has to actually be imported into your module here: if it wasn't imported, you would have no way to resolve the name os.path.isdir (for example) to an object, because there would be no 'os' name binding in your namespace.

(I would recommend keeping explicit 'import os' statements in your code in the name of being more explicit and having less confusion.)

What the os module does here is essentially:

import something as path
sys.modules["os.path"] = path

Both parts are necessary. Without an entry for "os.path" in sys.modules after the os module has finished loading, import fails with a 'no module' error. Without a path object in os's namespace, no one could actually look up anything from os.path even though Python would claim that the module had been imported.

If you are the right sort of person, you are now wondering what happens if you put the following in your own module, call it my.py:

import something as path
sys.modules["my.path"] = True

and then start importing my.path. The answer is that things get weird because there are two sorts of names and namespaces involved here, the namespace of module names (the keys of sys.modules) and the namespace in your module, and different sorts of import use them differently.

Here's how it breaks down in my testing:

  • 'import my.path' works because all it cares about is that there is a "my.path" entry in sys.modules when the dust settles. Actual name lookups go through the my module namespace, where path is a real module.
  • 'import my.path as path' works (you get a module instead of True), which surprised me; apparently import actually looks up the path name in my instead of taking the object directly from sys.modules. This is somewhat odd, as the my module is not imported into your namespace.
  • 'from my.path import isdir' fails, which seems odd given the above.

I suspect that all of this is subject to change without notice in future CPython versions (although it doesn't seem to have changed even in Python 3.1.1). Still, I find it both interesting and puzzling.

(Possibly all of this is documented somewhere in the Python language specification, but I couldn't spot it. The specification for import doesn't seem to talk about this sort of stuff.)

ImportOddities written at 00:24:51; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.