How 'from module import ...' is not doing what you may expect

March 16, 2016

There are a number of reasons to avoid things like 'from module import *'; for instance, it can be confusing later on and you can import more than you expect. But if you're doing this in the context of, say, just splitting a big source file apart it's tempting to say that these are not really problems. You're not going to be confused about where things come from because you're only importing everything from your own source files (and you're not even thinking of them as modules), and it's perfectly okay for there to be namespace contamination because that's kind of the point. But even then there are traps, because 'from module import ...' is not really doing what you might think it's doing.

There's two possible misconceptions here. If you're doing 'from module import *' within your own code, often what you want is for there to be one conjoined namespace where everything lives, both stuff from the other 'module' (really just a file) and stuff from your 'module' (the current file). If you're doing 'from module import A', it's easy (and tempting) to think that when you write plain A in your code, Python is basically automatically rewriting it to really be 'module.A' for you. Neither is what is actually going on in Python, although things can often look like it.

What a 'from module' import really does is it copies things from one module namespace into another. More specifically it copies the current bindings of names. You can think of 'from module import *' as doing something roughly like this:

import module
_this = globals()
for n in dir(module):
    _this[n] = getattr(module, n)

del _this
del module

(This code does not avoid internal names, doesn't respect __all__, and so on. It's a conceptual illustration.)

There are still two completely separate module namespaces, yours and the namespace of module; you have just copied a bunch of things from the module namespace into yours under the same name (or just some things, if you're doing 'from module import A, B'). Functions and classes from module are still using their module namespace, even if a reference to some or all of them has been copied into your module.

(As a corollary to this, things from module mostly can't refer to anything from your module namespace. This is easy to see since you can't have circular imports; if you're importing module to get at its namespace, it can't be importing you to get at yours. (Yes, there are odd ways around this.))

One reason why this matter is that if functions or classes from module update stuff in their module namespace, you may or may not pick it up in your own module. For example, consider the following code in some other module:

gvar = 10
func setit(newval):
  global gvar
  gvar = newval

The gvar that you see in your own module will forever be '10', no matter what calls to setit() have been made. However, code in the other module will see a different value for gvar.

Not all sorts of updates will do this, of course. If gvar is a dictionary and code just adds, changes, and deletes keys in it, everyone will see the same gvar. The illusion of a shared namespace can hold up, but it is ultimately only an illusion and it can be fragile. (And unless you already know Python well, it isn't necessarily easy to see where and when it's going to break down.)

Sidebar: An additional bit of possible weirdness

There are some situations where a module's namespace is more or less overwritten wholesale; the obvious case is reload() of the module. If you reload() a module that has been the subject of 'from module import ...', all of those bare imports are now broken, or at least not updated themselves. You can get into very odd situations this way (especially considering what reloading a module really does).


Comments on this page:

By Ewen McNeill at 2016-03-17 04:39:22:

Do you think that "from MYMODULE import *" would be a safer transition step (ie, during refactoring) if the things moved into MYMODULE were only (a) class definitions that were self contained (ie, no reference to globals/module scoped variables) or (b) pure functions (or semi-pure functions that just did input/output based on their function signature arguments)?

From my (C influenced) perspective the risk is holding secondary references to data which may no longer be referencing the current data. (And anything referencing a type that is immutable is most at risk.) So if you could ensure that whatever got put in the MYMODULE helper was self-contained, all those references should be on the "was-1500-lines" side of the refactoring and only referenced/changed via actions originating there.

If so, an initial refactoring step of a random 1500-lines program could be to try to break it into classes, even if those classes were mostly just encapsulating "functions and the data they manipulate". Some of which may, eg, end up effectively singletons. Then try to split it into separate files when, eg, there's no module-wide data left. Effectively you get your namespacing first then your structural separation into files.

Ewen

Written on 16 March 2016.
« I wish I could split up code more easily in Python
Some things I believe about importance and web page design »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Mar 16 23:59:45 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.