2014-04-27
Thoughts about Python classes as structures and optimization
I recently watched yet another video of a talk on getting good
performance out of Python. One of the things it talked about was
the standard issue of 'dictionary abuse', in this case in the context
of creating structures. If you want a collection of data, the
equivalent of a C struct
, things that speed up Python will do much
better if you say what you mean by representing it as a class:
class AStruct(object): def __init__(self, a, b, c): self.a = a self.b = b self.c = c
Even though Python is a dynamic language and AStruct
instances
could in theory be rearranged in many ways, in practice they generally
aren't and when they aren't we know a lot of ways to speed them up
and make them use minimal amounts of memory. If you instead just
throw them into a dictionary, much less optimization is (currently)
done.
(I suspect that many of these dynamic language optimizations could be applied to dictionary usage as well, it's just that people are hoping to avoid it for various reasons.)
My problem with this is that even small bits of extra typing tempt
me into unwise ways to reduce it. In this early example I both skipped having an __init__
function and just directly assigned attributes on new instances and
wrote a generic function to do it (this has
a better version). This is all well and good in ordinary CPython,
but now I have to wonder how far one can go before the various
optimizers and JIT engines will throw up their hands and give up
on clever things.
(I suspect that the straightforward __init__
version is easiest
for optimizers to handle, partly because it's a common pattern that
attributes aren't added to an instance after __init__
finishes.)
It's tempting to ask for standard library support for simple
structures in the form of something that makes them easy to declare.
You could do something like 'AStruct = structs.create('a', 'b',
'c')
' and then everything would work as expected (and optimizers
would have a good hook to latch on to). Unfortunately such a function
is hard to create today in Python, especially in a form that
optimizers like PyPy are likely to recognize and accelerate.
Probably this is a too petty and limited wish.
PS: of course the simplest and easiest to optimize version today
is just a class that just has a __slots__
and no __init__
.
PyPy et al are guaranteed that no other attributes will ever be set
on instances, so they can pack things as densely as they want.