Python scopes and the CPython bytecode opcodes that they use
Since I just got very curious about this, I want to write it down. CPython has four different families of bytecode opcodes for accessing variables, which are used under different circumstances and look in different places. They are:
- *_FAST opcodes are used to access function local variables, which
have a special storage scheme.
They're used in function scope for variables that are not part of a
closure (including inside a closure for variables that are local to
the closure alone).
- *_DEREF opcodes are used to access 'cell variables' that are used as part of the implementation
of closures; these are effectively function local variables with
an extra level of indirection.
They're used in function and closure scope for variables that are
part of a closure.
- *_GLOBAL opcodes are used to access variables that are explicitly
known to be global variables or builtins, for example if you are
in a function and have used
global
on a variable name. - *_NAME opcodes are used to access variables in code at the module level
or in a class that's being created, and are also the opcodes that
compile()
generates. Effectively they are CPython's generic 'access a variable somehow' opcodes.(Update: under some relatively rare circumstances, NAME opcodes can also be used in functions. When and why is complicated enough to call for a full entry.)
The first three opcode families look in the obvious places you'd expect. The NAME opcodes are a bit odd and to explain them I need to talk about stack frames briefly.
CPython code always runs in the context of a stack frame. Ignoring
implementation details for a moment, stack frames have three namespaces
associated with them; the frame's locals, the
globals, and the builtins. How
the NAME opcodes work is that the LOAD_NAME
bytecode looks at each
of these namespaces in succession and takes the first instance of the
name it finds, while STORE_NAME
always writes things into the
frame's locals. As the comments in the CPython source code put it:
[...] the name is treated as global until it is assigned to; then it is treated as a local.
(Technically something like this can also happen with LOAD_GLOBAL
and STORE_GLOBAL
, since STORE_GLOBAL
will not write to the
builtins but LOAD_GLOBAL
will look in them as a fallback.)
When code is running at the module level, the frame's local namespace is the same as the global (module) namespace, but you can still copy from the builtins namespace to the module namespace. When code is running during class creation, the local namespace is the class to be's namespace and the global namespace is the module namespace:
a = 10 print globals() is locals() class A(object): a = a print globals() is locals()
This prints True
then False
.
(This adds yet another place in CPython that I know of where the left side of an assignment can be in a different namespace than the right side, updating my previous update. One day I will have found them all.)
PS: I suspect that this implementation of the NAME opcodes explains
something I noticed a while ago, which is that the C level frame
structure always has a real f_locals
dictionary.
The code of the NAME opcodes directly uses this f_locals
field, so
having it always valid probably simplifies the innards of the bytecode
interpreter. In fact if f_locals
is NULL, the NAME opcodes throw
an exception.
(I'm not certain if it's ever possible to have a NULL f_locals
under normal circumstances, although if you do a web search for 'no
locals found when storing', 'no locals when loading', or 'no locals when
deleting' you will get some hits.)
Sidebar: A bonus trivia contest with compile()
again
If you've managed to follow my writing above, you should now be able to understand and explain what happens in the following code (a variant on yesterday's weirdness):
def exec_and_return(codeobj, x): exec codeobj return x a = 5 co = compile("a = a + x; x = a", '<s>', 'exec') print exec_and_return(co, 1), a
This prints '6 5
'.
(This behavior is probably not strictly speaking a bug and it
would be hard to fix given the constraints on compile()
.)
(Update: the explanation is now here.)
|
|