To get much faster, an implementation of Python must do less work

December 29, 2017

Python, like many other dynamically typed languages, is both flexible and what I'll call mutable. Python's dynamic typing and the power it gives you over objects means that apparently simple actions can unpredictably do complex things.

As an example of what I mean by this, consider the following code:

def afunc(dct, strct, thing2):
  loc = dct["athing"] + thing2
  return loc + strct.attr

It's possible and perhaps even very likely that this Python code does something very straightforward, where dct is a plain dictionary, strct is a plain object with straightforward instance attributes, and all the values are basic built-in Python types (ideally the same type, such as integers) with straightforward definitions of addition. But it's also possible that dct and strct are objects with complex behavior, dct["athing"] winds up returning another complex object with custom addition behavior, and thing2 is another complex object with its own behavior that will come into effect when the 'addition' starts happening. In addition, all of this can change over time; afunc() can be called with different sorts of arguments, and even for the same arguments, their behavior can be mutated between calls and even during a call.

A straightforward implementation of Python is going to go through checking for all of these possibilities every time through, and it's going to generate real Python objects for everything, probably including intermediate forms and values. Even when strct is really just a plain object that has some fields but no methods or other behavior, a Python implementation is probably going to use a generic, broad implementation (even __slots__ is fairly general here; a lot of things still happen to look up slot values). All of this is work, and all of this work takes time. Even if some of this work is done inefficiently today in any particular implementation, there is a limit to how much it can be improved and sped up.

This leads to a straightforward conclusion: to really get faster, a Python implementation must do less work. It must recognize cases where all of this flexibility and mutability is not being used and then skip it for more minimal implementations that do less.

The ultimate version of this is Python recognizing, for example, when it is only working with plain integers in code like this:

def func2(upto, alist):
  mpos = -1
  for i in range(upto):
     if (alist[i] % upto) == 0:
        mpos = i
  return mpos

If upto and alist have suitable values, this can turn into pretty fast code. But it can become fast code only when Python can do almost none of the work that it normally would; no iterator created by range() and then traversed by the for loop, no Python integer objects created for i and the % operation, no complex lookup procedure for alist (just a memory dereference at an offset, with the bounds of alist checked once), the % operation being the integer modulo operation, and so on. The most efficient possible implementations of all of those general operations cannot come close to the performance of not doing them at all.

(This is true of more or less all dynamic languages. Implementation tweaks can speed them up to some degree, but to get significant speed improvements they must do less work. In JIT environments, this often goes by the term 'specialization'.)

Written on 29 December 2017.
« How our IMAP server wound up running out of inodes
Some details of ZFS DVAs and what some of their fields store »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Dec 29 01:37:56 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.