2010-11-13
How os.path
exposes some Python import weirdness
For reasons beyond the scope of this entry, I was
recently looking at the source code for the os module when a question suddenly struck
me: how on earth does 'import os.path
' work?
The conventional way to have submodules like this is to have your module
be a directory with an __init__.py
file, and then the submodule is
either a Python file or a subdirectory. However, the os
module is not
a directory; instead it is a single file, os.py
, with no os/path.py
for Python to import in any conventional way. So what's going on?
First off, recall that importing a module loads and executes that
module's code (or in the case of directory based modules, the code
in its __init__.py
file). As it turns out, when you do 'import
os.path
' the os
module itself is loaded and executed even though it
is a plain file and not a subdirectory, and is also imported into your
module. Once you think about it, it's clear that the os
module has
to actually be imported into your module here: if it wasn't imported,
you would have no way to resolve the name os.path.isdir
(for example)
to an object, because there would be no 'os
' name binding in your
namespace.
(I would recommend keeping explicit 'import os
' statements in your
code in the name of being more explicit and having less confusion.)
What the os
module does here is essentially:
import something as path
sys.modules["os.path"] = path
Both parts are necessary. Without an entry for "os.path" in
sys.modules
after the os
module has finished loading, import
fails with a 'no module' error. Without a path
object in os
's
namespace, no one could actually look up anything from os.path
even
though Python would claim that the module had been imported.
If you are the right sort of person, you are now wondering what happens
if you put the following in your own module, call it my.py
:
import something as path
sys.modules["my.path"] = True
and then start importing my.path
. The answer is that things get weird
because there are two sorts of names and namespaces involved here,
the namespace of module names (the keys of sys.modules
) and the
namespace in your module, and different sorts of import
use them
differently.
Here's how it breaks down in my testing:
- '
import my.path
' works because all it cares about is that there is a "my.path" entry insys.modules
when the dust settles. Actual name lookups go through themy
module namespace, wherepath
is a real module. - '
import my.path as path
' works (you get a module instead ofTrue
), which surprised me; apparentlyimport
actually looks up thepath
name inmy
instead of taking the object directly fromsys.modules
. This is somewhat odd, as themy
module is not imported into your namespace. - '
from my.path import isdir
' fails, which seems odd given the above.
I suspect that all of this is subject to change without notice in future CPython versions (although it doesn't seem to have changed even in Python 3.1.1). Still, I find it both interesting and puzzling.
(Possibly all of this is documented somewhere in the Python language
specification, but I couldn't spot it. The specification for import
doesn't seem to talk about this sort of stuff.)