Wandering Thoughts archives

2010-11-13

How os.path exposes some Python import weirdness

For reasons beyond the scope of this entry, I was recently looking at the source code for the os module when a question suddenly struck me: how on earth does 'import os.path' work?

The conventional way to have submodules like this is to have your module be a directory with an __init__.py file, and then the submodule is either a Python file or a subdirectory. However, the os module is not a directory; instead it is a single file, os.py, with no os/path.py for Python to import in any conventional way. So what's going on?

First off, recall that importing a module loads and executes that module's code (or in the case of directory based modules, the code in its __init__.py file). As it turns out, when you do 'import os.path' the os module itself is loaded and executed even though it is a plain file and not a subdirectory, and is also imported into your module. Once you think about it, it's clear that the os module has to actually be imported into your module here: if it wasn't imported, you would have no way to resolve the name os.path.isdir (for example) to an object, because there would be no 'os' name binding in your namespace.

(I would recommend keeping explicit 'import os' statements in your code in the name of being more explicit and having less confusion.)

What the os module does here is essentially:

import something as path
sys.modules["os.path"] = path

Both parts are necessary. Without an entry for "os.path" in sys.modules after the os module has finished loading, import fails with a 'no module' error. Without a path object in os's namespace, no one could actually look up anything from os.path even though Python would claim that the module had been imported.

If you are the right sort of person, you are now wondering what happens if you put the following in your own module, call it my.py:

import something as path
sys.modules["my.path"] = True

and then start importing my.path. The answer is that things get weird because there are two sorts of names and namespaces involved here, the namespace of module names (the keys of sys.modules) and the namespace in your module, and different sorts of import use them differently.

Here's how it breaks down in my testing:

  • 'import my.path' works because all it cares about is that there is a "my.path" entry in sys.modules when the dust settles. Actual name lookups go through the my module namespace, where path is a real module.
  • 'import my.path as path' works (you get a module instead of True), which surprised me; apparently import actually looks up the path name in my instead of taking the object directly from sys.modules. This is somewhat odd, as the my module is not imported into your namespace.
  • 'from my.path import isdir' fails, which seems odd given the above.

I suspect that all of this is subject to change without notice in future CPython versions (although it doesn't seem to have changed even in Python 3.1.1). Still, I find it both interesting and puzzling.

(Possibly all of this is documented somewhere in the Python language specification, but I couldn't spot it. The specification for import doesn't seem to talk about this sort of stuff.)

python/ImportOddities written at 00:24:51;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.