Classic linked lists versus Python's list (array) type

April 13, 2013

For reasons beyond the margins of this entry, let's consider a classic linked list implemented in Python. Because I feel like a traditionalist today we'll built it out of Lisp-style cons cells, using about the most minimal and lightweight implementation we can do:

class Cons(object):
    __slots__ = ('car', 'cdr')
    def __init__(self, car, cdr):
        self.car = car
        self.cdr = cdr
    def __str__(self):
        return '(%s, %s)' % (self.car, self.cdr)

Now let's ask a question: how does the memory use and performance of this compare to just using a Python list (which is not a linked list but instead an array)? I'm going to look purely at building a 1,000 element list element-by-element and I'm going to allow each implementation to append in whatever order is fastest for it. The code:

from itertools import repeat
def native(size):
    l = []
    for _ in repeat(None, size):
        l.append(0)
    return l

def conslst(size):
    h = Cons(0, None)
    for _ in repeat(None, size):
        h = Cons(0, h)
    return h

(itertools.repeat is the fastest way to do this loop.)

On a 32-bit machine the 1,000 element native list takes 4,512 bytes. A Cons cell takes 28 bytes (not counting the size of what it points to for either car or cdr) and so 1,000 of them takes 28,000 bytes, a factor of six or so worse than the list.

As for timings, the Cons-based list construction for a thousand elements is about a factor of five worse than Python native lists on my test machine (if I have GC running). Creating the Cons objects appears to be very cheap and what matters for the runtime is all of the manipulation that goes on around them. Creating shorter lists is somewhat better, creating longer ones is worse.

(Since I checked just to be sure, I can tell you that a version of Cons that doesn't use __slots__ runs somewhat more slowly.)

Right now some of my readers are rolling their eyes and telling me that of course the Cons version is worse and always was going to be worse; that's why everyone uses native Python lists. That is sort of the point of this exercise, but that's another entry.

Sidebar: how to make the Cons version go fast

The short answer is to use PyPy. Run under PyPy, conslst() is clearly somewhat faster than native(), even with larger lists. Both versions run drastically faster than the plain Python version (which is what you'd hope for, since both ought to be highly optimizable). Unfortunately there are plenty of environments that are not PyPy-enabled and probably won't be for some time (for example, all embedded web server environments like mod_wsgi and UWSGI).

Written on 13 April 2013.
« My view on software RAID and the RAID write hole
Python's data structures problem »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Apr 13 01:36:18 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.