How os.path exposes some Python import weirdness

November 13, 2010

For reasons beyond the scope of this entry, I was recently looking at the source code for the os module when a question suddenly struck me: how on earth does 'import os.path' work?

The conventional way to have submodules like this is to have your module be a directory with an __init__.py file, and then the submodule is either a Python file or a subdirectory. However, the os module is not a directory; instead it is a single file, os.py, with no os/path.py for Python to import in any conventional way. So what's going on?

First off, recall that importing a module loads and executes that module's code (or in the case of directory based modules, the code in its __init__.py file). As it turns out, when you do 'import os.path' the os module itself is loaded and executed even though it is a plain file and not a subdirectory, and is also imported into your module. Once you think about it, it's clear that the os module has to actually be imported into your module here: if it wasn't imported, you would have no way to resolve the name os.path.isdir (for example) to an object, because there would be no 'os' name binding in your namespace.

(I would recommend keeping explicit 'import os' statements in your code in the name of being more explicit and having less confusion.)

What the os module does here is essentially:

import something as path
sys.modules["os.path"] = path

Both parts are necessary. Without an entry for "os.path" in sys.modules after the os module has finished loading, import fails with a 'no module' error. Without a path object in os's namespace, no one could actually look up anything from os.path even though Python would claim that the module had been imported.

If you are the right sort of person, you are now wondering what happens if you put the following in your own module, call it my.py:

import something as path
sys.modules["my.path"] = True

and then start importing my.path. The answer is that things get weird because there are two sorts of names and namespaces involved here, the namespace of module names (the keys of sys.modules) and the namespace in your module, and different sorts of import use them differently.

Here's how it breaks down in my testing:

  • 'import my.path' works because all it cares about is that there is a "my.path" entry in sys.modules when the dust settles. Actual name lookups go through the my module namespace, where path is a real module.
  • 'import my.path as path' works (you get a module instead of True), which surprised me; apparently import actually looks up the path name in my instead of taking the object directly from sys.modules. This is somewhat odd, as the my module is not imported into your namespace.
  • 'from my.path import isdir' fails, which seems odd given the above.

I suspect that all of this is subject to change without notice in future CPython versions (although it doesn't seem to have changed even in Python 3.1.1). Still, I find it both interesting and puzzling.

(Possibly all of this is documented somewhere in the Python language specification, but I couldn't spot it. The specification for import doesn't seem to talk about this sort of stuff.)


Comments on this page:

From 76.112.222.160 at 2010-11-14 09:29:59:

I'm guessing one of the reasons it does this is to use a different path module depending which platform you are on (ntpath for windows, posixpath for *nix).

-steve

From 173.13.183.233 at 2010-11-14 09:49:08:

A minor clarification or three.

First, os.path is a special case. There are several implementations of "os.path", and os.py decides at runtime which to use. Python uses "ntpath.py" for Windows, there are some custom implementations for some oddball legacy platforms (OS2 EMX, RISC OS, pre-OS X Macintosh), and for everything else Python uses "posixpath.py".

Second, in CPython you can always rely on os.path being importable. As you note os.path is installed by os.py. Well, os.py is imported by site.py, and Python is defined to always import site.py as part of interpreter startup. So if you open a new Python instance and execute

    >>> import sys
    >>> sorted(sys.modules.keys())

you'll see that os and os.path are both already present, along with a bunch of other gunk. So, technically, your statement "when you do 'import os.path' the os module itself is loaded" is incorrect, because it's already been loaded.

Now, this isn't defined behavior, exactly. But it's such a widely-held assumption that I bet other implementations have to emulate it as a compatibility measure.

Third, I wanted to point out something you glossed over, for the benefit of your readers: playing these games with sys.modules is the exception rather than the rule. And, while your demonstration is accurate--you can deliberately create a disconnect between the sys.modules and the actual recursively-defined relevant object--in practice nobody does this.

So it's nonsense to advise your readers to have 'import os' even when all they wanted was os.path. The rule is, you should explicitly import all the leaves of the module tree you desire, as that will implicitly import all their parents and create the root-level module objects in the current scope. If you "import xml.etree.ElementTree" in a fresh interpreter, this first imports xml (and adds it to your module's global scope), then imports xml.etree (and adds it to the xml object), then imports xml.etree.ElementTree (and adds it to the xml.etree object). So you get xml and xml.etree for free. You and your readers can safely say "import os.path" or "import xml.etree.ElementTree" without fear.

By cks at 2010-11-14 11:38:16:

My note about still doing 'import os' was badly written. What I meant was that if you wanted to use things from both os and os.path, you should still explicitly import os as well as os.path, instead of just doing 'import os.path' and relying on the side effects to import os as well.

Written on 13 November 2010.
« The changing assumptions about viruses in email
The ordering of SSL chain certificates »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Sat Nov 13 00:24:51 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.