Thoughts about Python classes as structures and optimization

April 27, 2014

I recently watched yet another video of a talk on getting good performance out of Python. One of the things it talked about was the standard issue of 'dictionary abuse', in this case in the context of creating structures. If you want a collection of data, the equivalent of a C struct, things that speed up Python will do much better if you say what you mean by representing it as a class:

class AStruct(object):
  def __init__(self, a, b, c):
    self.a = a
    self.b = b
    self.c = c

Even though Python is a dynamic language and AStruct instances could in theory be rearranged in many ways, in practice they generally aren't and when they aren't we know a lot of ways to speed them up and make them use minimal amounts of memory. If you instead just throw them into a dictionary, much less optimization is (currently) done.

(I suspect that many of these dynamic language optimizations could be applied to dictionary usage as well, it's just that people are hoping to avoid it for various reasons.)

My problem with this is that even small bits of extra typing tempt me into unwise ways to reduce it. In this early example I both skipped having an __init__ function and just directly assigned attributes on new instances and wrote a generic function to do it (this has a better version). This is all well and good in ordinary CPython, but now I have to wonder how far one can go before the various optimizers and JIT engines will throw up their hands and give up on clever things.

(I suspect that the straightforward __init__ version is easiest for optimizers to handle, partly because it's a common pattern that attributes aren't added to an instance after __init__ finishes.)

It's tempting to ask for standard library support for simple structures in the form of something that makes them easy to declare. You could do something like 'AStruct = structs.create('a', 'b', 'c')' and then everything would work as expected (and optimizers would have a good hook to latch on to). Unfortunately such a function is hard to create today in Python, especially in a form that optimizers like PyPy are likely to recognize and accelerate. Probably this is a too petty and limited wish.

PS: of course the simplest and easiest to optimize version today is just a class that just has a __slots__ and no __init__. PyPy et al are guaranteed that no other attributes will ever be set on instances, so they can pack things as densely as they want.

Comments on this page:

By at 2014-04-27 09:43:25:

I'm not sure if this applies to your use-case, but I found namedtuple quiet handy in similar situations.

By cks at 2014-04-27 22:49:33:

Unfortunately namedtuple acts too much like a real tuple, in that it doesn't allow you to change the (named) fields after the instance is created. This makes it fine for constant objects but not useful for non-constant ones. I suspect that one could create a namedlist class factory using basically the same code as namedtuple and I honestly wonder why no one has done that.

By Twirrim at 2014-04-28 02:12:47:

I'm guessing this was the video: Alex Gaynor: "Fast Python, Slow Python"

Watching it the other day has got me pondering about some of my coding practices. I still tend to avoid classes, and still tend to think of code procedurally (lots of functions) rather than more OO.

I haven't yet got around to doing some benchmarking, but I figure there is some scope for interesting analysis there. How much faster? How do different engines handle it?

It was an interesting talk, emphasising some good practices, but it felt a little "because I say so" :)

Written on 27 April 2014.
« What I can see about how ZFS deduplication seems to work on disk
How dynamic language code gets optimized »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Apr 27 03:33:58 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.