To get much faster, an implementation of Python must do less work
Python, like many other dynamically typed languages, is both flexible and what I'll call mutable. Python's dynamic typing and the power it gives you over objects means that apparently simple actions can unpredictably do complex things.
As an example of what I mean by this, consider the following code:
def afunc(dct, strct, thing2): loc = dct["athing"] + thing2 return loc + strct.attr
It's possible and perhaps even very likely that this Python code
does something very straightforward, where dct
is a plain dictionary,
strct
is a plain object with straightforward instance attributes,
and all the values are basic built-in Python types (ideally the
same type, such as integers) with straightforward definitions
of addition. But it's also possible that dct
and strct
are objects
with complex behavior, dct["athing"]
winds up returning another
complex object with custom addition behavior, and thing2
is another
complex object with its own behavior that will come into effect
when the 'addition' starts happening. In addition, all of this can
change over time; afunc()
can be called with different sorts of
arguments, and even for the same arguments, their behavior can be
mutated between calls and even during a call.
A straightforward implementation of Python is going to go through
checking for all of these possibilities every time through, and
it's going to generate real Python objects for everything, probably
including intermediate forms and values. Even when strct
is really
just a plain object that has some fields but no methods or other
behavior, a Python implementation is probably going to use a generic,
broad implementation (even __slots__
is fairly general here; a
lot of things still happen to look up slot values). All of this is
work, and all of this work takes time. Even if some of this work
is done inefficiently today in any particular implementation, there
is a limit to how much it can be improved and sped up.
This leads to a straightforward conclusion: to really get faster, a Python implementation must do less work. It must recognize cases where all of this flexibility and mutability is not being used and then skip it for more minimal implementations that do less.
The ultimate version of this is Python recognizing, for example, when it is only working with plain integers in code like this:
def func2(upto, alist): mpos = -1 for i in range(upto): if (alist[i] % upto) == 0: mpos = i return mpos
If upto
and alist
have suitable values, this can turn into
pretty fast code. But it can become fast code only when Python can
do almost none of the work that it normally would; no iterator
created by range()
and then traversed by the for
loop, no Python
integer objects created for i
and the %
operation, no complex
lookup procedure for alist
(just a memory dereference at an offset,
with the bounds of alist
checked once), the %
operation being
the integer modulo operation, and so on. The most efficient possible
implementations of all of those general operations cannot come close
to the performance of not doing them at all.
(This is true of more or less all dynamic languages. Implementation tweaks can speed them up to some degree, but to get significant speed improvements they must do less work. In JIT environments, this often goes by the term 'specialization'.)
|
|