Python's data structures problem

April 14, 2013

Python has a problem with data structures. Well, actually it has two, which are closely related to each other.

The first problem is what I illustrated yesterday, namely that there is generally no point in building nice sophisticated data structures in your Python code because they won't perform very well. Priority heaps, linked lists, all sorts of trees (B+, red-black, splay, etc), they're all nifty and nice but generally there's no point in even trying. Unless things are relatively unusual in your problem you'll be just as fast (if not faster) and write less code by just using and abusing Python's native data types. So what if dictionaries and lists (and a few other things that have been implemented at the C level) aren't quite an exact fit for your problem? They're the best you're going to get.

(I've written about this before, although that was more the general version instead of a Python-focused one.)

In theory it might sense to implement your own data structures anyways because they can efficiently support unusual operations that are important to you. In practice my impression is that the performance difference is generally assumed to be large enough that people don't bother doing this unless simpler and more brute force versions are clearly inadequate.

The second problem is that this isn't really true. Data structures implemented in Python code under CPython are slow but other Python implementations can and do make them fast, sometimes even faster than a similar data structure kludged together with native types. But almost everyone writes for CPython and so they're not going to create these alternate data structures that (eg) PyPy could make fast. In fact sometimes they may kludge up data structures that PyPy et al have a very hard time making go fast; they're fine for CPython but pessimal for other environments.

My view is that this matters if we want Python to ever get fast, because getting fast is going to call for data structures that are amenable to optimization instead of what I've seen called hash-map addiction. But I have no (feasible) ideas for how to fix things and I'm pretty sure that asking people to code data structures in CPython isn't feasible until there's a benefit to it even in CPython.

(This is in part a slow reaction to Jason Moiron's What's Going On.)


Comments on this page:

From 139.222.218.211 at 2013-04-14 08:02:42:

My experience is that the few times I've wanted a more complicated data structure I just install a CPython extension module that implements it in C -- although I've never actually done any performance analytics to see whether getting data back out is a bottleneck.

From 76.113.49.212 at 2013-04-14 13:01:38:

So, what's next, if anything? I take it Go wasn't a revolution in the way the migration from C to Python was. Haskell seems too high-brow and a poor match to a common programmer's cogntion.

From 198.189.14.2 at 2013-04-15 18:36:18:

Any thoughts on Lua?

By cks at 2013-04-16 11:47:05:

I haven't looked at Lua for various reasons. I understand that it has a JIT compiler and so might run faster than Python in some situations. I don't know if it has the large ecology that's part of the Python attraction. In general, my feeling right now is that if I hit a problem that's too demanding for Python I'm most likely to turn to Go.

(I'm also hoping that the 'write Python extensions in Go' project gets to a solidly mature state because in a way that would be the best of both worlds.)

Written on 14 April 2013.
« Classic linked lists versus Python's list (array) type
Go's friction points for me (and a comparison to Python) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Apr 14 01:55:35 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.