2010-11-20
More on those Python import oddities
From the reddit discussion
and also the comments of my previous entry on import,
I learned that people both do 'import os.path' and then use
os.whatever without explicitly importing os, and do 'import os'
and then just use os.path.whatever without explicitly importing
os.path.
First off, I don't think that you should do either of these even if
(and when) they work, for os or any other module. This is a stylistic
thing, but when doing imports I prefer to be explicit about what other
modules my code uses. Also, I don't like clever tricks like this because
they run a high risk of confusing people who read my code, and this
includes me in the future if I've forgotten this bit of arcane trivia by
then.
Both of these clearly work for os. But are they guaranteed to work in
general? The answer is half yes and half no.
As sort of discussed last time, 'import x.y;
x.whatever' is guaranteed to work because the semantics of useful
multi-level imports require it. 'import x.y' is pointless if you cannot
resolve x.y.whatever afterwards, and in order to do that you must have
'x' in your namespace after the dust settles. So Python gives it
to you.
A more interesting question is whether 'import x; x.y.whatever' is
guaranteed to work. The short answer is no, although I suspect that it
often will for relatively small modules. First off, modules that are
implemented as single Python files (as with os) have to make this
work; as discussed last time, x.y must be defined
after x.py has finished executing as part of the import process, because
there is no other way for the interpreter to find the x.y module.
For modules that are implemented as directories (with submodules as
either files or subdirectories) there is no requirement that the
module's __init__.py import the y submodule for you. The tradeoff
is that importing submodules automatically makes 'from x import
*' work as people expect, at the cost of loading potentially large
submodules that people are not going to use; the larger and less used
your set of submodules is, the more this matters. So you can sensibly
have a module that requires explicit imports of submodules, and indeed
there are modules in the Python standard library that work this way
(xml is one example).
Now we come to a piece of import trivia: importing a submodule will
actually modify the parent module's namespace. If you do import x.y
(in any variant) and y is not defined in x's namespace, Python adds
it for you. Once I thought about it, I realized that it had to work
this way if Python wanted to support submodules that had to be loaded by
hand, but I find it vaguely interesting that Python is willing to drop
things in another module's namespace for you as a result of stuff that
you do.
(This happens even if you do not get an explicit reference to x,
eg if you do 'import x.y as t'.)
2010-11-13
How os.path exposes some Python import weirdness
For reasons beyond the scope of this entry, I was
recently looking at the source code for the os module when a question suddenly struck
me: how on earth does 'import os.path' work?
The conventional way to have submodules like this is to have your module
be a directory with an __init__.py file, and then the submodule is
either a Python file or a subdirectory. However, the os module is not
a directory; instead it is a single file, os.py, with no os/path.py
for Python to import in any conventional way. So what's going on?
First off, recall that importing a module loads and executes that
module's code (or in the case of directory based modules, the code
in its __init__.py file). As it turns out, when you do 'import
os.path' the os module itself is loaded and executed even though it
is a plain file and not a subdirectory, and is also imported into your
module. Once you think about it, it's clear that the os module has
to actually be imported into your module here: if it wasn't imported,
you would have no way to resolve the name os.path.isdir (for example)
to an object, because there would be no 'os' name binding in your
namespace.
(I would recommend keeping explicit 'import os' statements in your
code in the name of being more explicit and having less confusion.)
What the os module does here is essentially:
import something as path
sys.modules["os.path"] = path
Both parts are necessary. Without an entry for "os.path" in
sys.modules after the os module has finished loading, import
fails with a 'no module' error. Without a path object in os's
namespace, no one could actually look up anything from os.path even
though Python would claim that the module had been imported.
If you are the right sort of person, you are now wondering what happens
if you put the following in your own module, call it my.py:
import something as path
sys.modules["my.path"] = True
and then start importing my.path. The answer is that things get weird
because there are two sorts of names and namespaces involved here,
the namespace of module names (the keys of sys.modules) and the
namespace in your module, and different sorts of import use them
differently.
Here's how it breaks down in my testing:
- '
import my.path' works because all it cares about is that there is a "my.path" entry insys.moduleswhen the dust settles. Actual name lookups go through themymodule namespace, wherepathis a real module. - '
import my.path as path' works (you get a module instead ofTrue), which surprised me; apparentlyimportactually looks up thepathname inmyinstead of taking the object directly fromsys.modules. This is somewhat odd, as themymodule is not imported into your namespace. - '
from my.path import isdir' fails, which seems odd given the above.
I suspect that all of this is subject to change without notice in future CPython versions (although it doesn't seem to have changed even in Python 3.1.1). Still, I find it both interesting and puzzling.
(Possibly all of this is documented somewhere in the Python language
specification, but I couldn't spot it. The specification for import
doesn't seem to talk about this sort of stuff.)