2016-03-16
How 'from module import ...
' is not doing what you may expect
There are a number of reasons to avoid things like 'from module import
*
'; for instance, it can be confusing later on
and you can import more than you expect. But if
you're doing this in the context of, say, just splitting a big source
file apart it's tempting to say that these
are not really problems. You're not going to be confused about where
things come from because you're only importing everything from your own
source files (and you're not even thinking of them as modules), and it's
perfectly okay for there to be namespace contamination because that's
kind of the point. But even then there are traps, because 'from module
import ...
' is not really doing what you might think it's doing.
There's two possible misconceptions here. If you're doing 'from
module import *
' within your own code, often what you want is for
there to be one conjoined namespace where everything lives, both
stuff from the other 'module' (really just a file) and stuff from
your 'module' (the current file). If you're doing 'from module
import A
', it's easy (and tempting) to think that when you write
plain A
in your code, Python is basically automatically rewriting
it to really be 'module.A
' for you. Neither is what is actually
going on in Python, although things can often look like it.
What a 'from module
' import really does is it copies things
from one module namespace into another. More specifically it
copies the current bindings of names. You can
think of 'from module import *
' as doing something roughly like
this:
import module _this = globals() for n in dir(module): _this[n] = getattr(module, n) del _this del module
(This code does not avoid internal names, doesn't respect __all__
,
and so on. It's a conceptual illustration.)
There are still two completely separate module namespaces, yours
and the namespace of module
; you have just copied a bunch of
things from the module
namespace into yours under the same name
(or just some things, if you're doing 'from module import A, B
').
Functions and classes from module
are still using their module
namespace, even if a reference to some or all of them has been
copied into your module.
(As a corollary to this, things from module
mostly can't refer
to anything from your module namespace. This is easy to see since
you can't have circular imports; if you're importing module
to
get at its namespace, it can't be importing you to get at yours.
(Yes, there are odd ways around this.))
One reason why this matter is that if functions or classes from
module
update stuff in their module namespace, you may or may not
pick it up in your own module. For example, consider the following
code in some other module:
gvar = 10func setit(newval): global gvar gvar = newval
The gvar
that you see in your own module will forever be '10
',
no matter what calls to setit()
have been made. However, code in
the other module will see a different value for gvar
.
Not all sorts of updates will do this, of course. If gvar
is a
dictionary and code just adds, changes, and deletes keys in it,
everyone will see the same gvar
. The illusion of a shared namespace
can hold up, but it is ultimately only an illusion and it can be
fragile. (And unless you already know Python well, it isn't necessarily
easy to see where and when it's going to break down.)
Sidebar: An additional bit of possible weirdness
There are some situations where a module's namespace is more or
less overwritten wholesale; the obvious case is reload()
of the
module. If you reload()
a module that has been the subject of
'from module import ...
', all of those bare imports are now broken,
or at least not updated themselves. You can get into very odd
situations this way (especially considering what reloading a
module really does).
I wish I could split up code more easily in Python
This really starts with some tweets:
This Python program has grown to almost 1500 lines. I think I need an intervention, or better data structures, or something.
I also wish it was easier and more convenient to split up a Python program across multiple source files (it's one way Go wins).
The best way to split up a big program is to genuinely modularize it. In other words, find separate pieces of functionality that can be cleanly extracted and turn them into Python modules, in separate files. There are still issues with your main program actually finding the modules, but this can be worked around (even though it is and remains annoying).
However, this assumes that you have a modular structure to start with, with things sensibly separated. If your program started off as a little 200 line thing and then grew step by step into a 1500 line monster (especially iteratively), you may not necessarily have this. That's where Python makes things a little bit awkward. Splitting things up into separate files fundamentally puts them in separate modules and thus separate namespaces; in order to do it, you need to be able to pull your code apart in this way. If your code isn't in this state already you have some degree of rewriting ahead of you, and in the mean time you have a 1500 line Python file.
(In theory you can do 'from modname import *
'. In practice this
is only faking a single namespace and the fakery can break down in
various ways.)
Go may be less elegant here (and Go certainly makes it harder to have separate namespaces), but you can slice a big source file up into several separate ones while keeping them all co-mingled as one module, all using bits and pieces from each other. Sometimes this is more convenient and expedient, even if it may be uglier.
With that said, Python has excellent reasons to require every separate file to be a separate module. To summarize very quickly, it's tied to how you don't just load a file of Python source code, you run it (with things like function and class definitions actually being executable statements, and possibly other interesting things happening). This is a straightforward model that's quite appropriate for an interpreted language, but it imposes certain constraints.