Python scopes and the CPython bytecode opcodes that they use

May 3, 2012

Since I just got very curious about this, I want to write it down. CPython has four different families of bytecode opcodes for accessing variables, which are used under different circumstances and look in different places. They are:

  • *_FAST opcodes are used to access function local variables, which have a special storage scheme. They're used in function scope for variables that are not part of a closure (including inside a closure for variables that are local to the closure alone).

  • *_DEREF opcodes are used to access 'cell variables' that are used as part of the implementation of closures; these are effectively function local variables with an extra level of indirection. They're used in function and closure scope for variables that are part of a closure.

  • *_GLOBAL opcodes are used to access variables that are explicitly known to be global variables or builtins, for example if you are in a function and have used global on a variable name.

  • *_NAME opcodes are used to access variables in code at the module level or in a class that's being created, and are also the opcodes that compile() generates. Effectively they are CPython's generic 'access a variable somehow' opcodes.

    (Update: under some relatively rare circumstances, NAME opcodes can also be used in functions. When and why is complicated enough to call for a full entry.)

The first three opcode families look in the obvious places you'd expect. The NAME opcodes are a bit odd and to explain them I need to talk about stack frames briefly.

CPython code always runs in the context of a stack frame. Ignoring implementation details for a moment, stack frames have three namespaces associated with them; the frame's locals, the globals, and the builtins. How the NAME opcodes work is that the LOAD_NAME bytecode looks at each of these namespaces in succession and takes the first instance of the name it finds, while STORE_NAME always writes things into the frame's locals. As the comments in the CPython source code put it:

[...] the name is treated as global until it is assigned to; then it is treated as a local.

(Technically something like this can also happen with LOAD_GLOBAL and STORE_GLOBAL, since STORE_GLOBAL will not write to the builtins but LOAD_GLOBAL will look in them as a fallback.)

When code is running at the module level, the frame's local namespace is the same as the global (module) namespace, but you can still copy from the builtins namespace to the module namespace. When code is running during class creation, the local namespace is the class to be's namespace and the global namespace is the module namespace:

a = 10
print globals() is locals()
class A(object):
  a = a
  print globals() is locals()

This prints True then False.

(This adds yet another place in CPython that I know of where the left side of an assignment can be in a different namespace than the right side, updating my previous update. One day I will have found them all.)

PS: I suspect that this implementation of the NAME opcodes explains something I noticed a while ago, which is that the C level frame structure always has a real f_locals dictionary. The code of the NAME opcodes directly uses this f_locals field, so having it always valid probably simplifies the innards of the bytecode interpreter. In fact if f_locals is NULL, the NAME opcodes throw an exception.

(I'm not certain if it's ever possible to have a NULL f_locals under normal circumstances, although if you do a web search for 'no locals found when storing', 'no locals when loading', or 'no locals when deleting' you will get some hits.)

Sidebar: A bonus trivia contest with compile() again

If you've managed to follow my writing above, you should now be able to understand and explain what happens in the following code (a variant on yesterday's weirdness):

def exec_and_return(codeobj, x):
  exec codeobj
  return x

a = 5
co = compile("a = a + x; x = a", '<s>', 'exec')

print exec_and_return(co, 1), a

This prints '6 5'.

(This behavior is probably not strictly speaking a bug and it would be hard to fix given the constraints on compile().)

(Update: the explanation is now here.)

Written on 03 May 2012.
« Into the depths of a Python exec scope-handling bug
Explaining a piece of deep weirdness with Python's exec »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu May 3 01:15:07 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.