Python scopes and the CPython bytecode opcodes that they use
Since I just got very curious about this, I want to write it down. CPython has four different families of bytecode opcodes for accessing variables, which are used under different circumstances and look in different places. They are:
- *_FAST opcodes are used to access function local variables, which
have a special storage scheme.
They're used in function scope for variables that are not part of a
closure (including inside a closure for variables that are local to
the closure alone).
- *_DEREF opcodes are used to access 'cell variables' that are used as part of the implementation
of closures; these are effectively function local variables with
an extra level of indirection.
They're used in function and closure scope for variables that are
part of a closure.
- *_GLOBAL opcodes are used to access variables that are explicitly
known to be global variables or builtins, for example if you are
in a function and have used
globalon a variable name.
- *_NAME opcodes are used to access variables in code at the module level
or in a class that's being created, and are also the opcodes that
compile()generates. Effectively they are CPython's generic 'access a variable somehow' opcodes.
(Update: under some relatively rare circumstances, NAME opcodes can also be used in functions. When and why is complicated enough to call for a full entry.)
The first three opcode families look in the obvious places you'd expect. The NAME opcodes are a bit odd and to explain them I need to talk about stack frames briefly.
CPython code always runs in the context of a stack frame. Ignoring
implementation details for a moment, stack frames have three namespaces
associated with them; the frame's locals, the
globals, and the builtins. How
the NAME opcodes work is that the
LOAD_NAME bytecode looks at each
of these namespaces in succession and takes the first instance of the
name it finds, while
STORE_NAME always writes things into the
frame's locals. As the comments in the CPython source code put it:
[...] the name is treated as global until it is assigned to; then it is treated as a local.
(Technically something like this can also happen with
STORE_GLOBAL will not write to the
LOAD_GLOBAL will look in them as a fallback.)
When code is running at the module level, the frame's local namespace is the same as the global (module) namespace, but you can still copy from the builtins namespace to the module namespace. When code is running during class creation, the local namespace is the class to be's namespace and the global namespace is the module namespace:
a = 10 print globals() is locals() class A(object): a = a print globals() is locals()
(This adds yet another place in CPython that I know of where the left side of an assignment can be in a different namespace than the right side, updating my previous update. One day I will have found them all.)
PS: I suspect that this implementation of the NAME opcodes explains
something I noticed a while ago, which is that the C level frame
structure always has a real
The code of the NAME opcodes directly uses this
f_locals field, so
having it always valid probably simplifies the innards of the bytecode
interpreter. In fact if
f_locals is NULL, the NAME opcodes throw
(I'm not certain if it's ever possible to have a NULL
under normal circumstances, although if you do a web search for 'no
locals found when storing', 'no locals when loading', or 'no locals when
deleting' you will get some hits.)
Sidebar: A bonus trivia contest with
If you've managed to follow my writing above, you should now be able to understand and explain what happens in the following code (a variant on yesterday's weirdness):
def exec_and_return(codeobj, x): exec codeobj return x a = 5 co = compile("a = a + x; x = a", '<s>', 'exec') print exec_and_return(co, 1), a
This prints '
(This behavior is probably not strictly speaking a bug and it
would be hard to fix given the constraints on
(Update: the explanation is now here.)