Why using local variables is fast in Python
One piece of Python optimization lore is that access to local variables is very fast, so if you repeatedly refer to something in a loop in a function you can optimize it by copying it to a local variable first (this often comes up when you are repeatedly calling a function). This fast access comes about because of some implementation details in CPython.
The CPython interpreter doesn't run Python code directly, but first compiles it to bytecode for a relatively simple stack-based virtual machine. You can see the actual bytecode of stuff with the dis module; most of the bytecodes themselves are documented here.
The bytecode puts local variables (and the function arguments) into a fixed-size array, statically assigning variable names to indexes, instead of a dictionary; it can do this because you can't add local variables to a function on the fly. Getting a local variable just requires bumping the reference count on the object in the appropriate slot, which is about as fast an operation as you can get.
Looking up global names (and attributes, such as self.foo
or
math.pow
) is significantly slower, since it involves at least one
and possibly several dictionary lookups. LOAD_GLOBAL
is fairly
heavily optimized, going so far as to inline a chunk of dictionary
lookup code; LOAD_ATTR
is far less so, plus has the likely
overhead of multiple dictionary lookups.
This is also why the global
statement is necessary; stores to global
variables are a different bytecode than stores to local variables, so
the bytecode compiler has to know which to generate when it compiles
the function. If a variable is ever assigned to, the bytecode compiler
declares it a local variable and thus generates STORE_FAST
instead
of anything else.
(This is a followup to KnowingImplementationsMatters.)
|
|