2013-04-13
Classic linked lists versus Python's list (array) type
For reasons beyond the margins of this entry, let's consider a classic linked list implemented in Python. Because I feel like a traditionalist today we'll built it out of Lisp-style cons cells, using about the most minimal and lightweight implementation we can do:
class Cons(object): __slots__ = ('car', 'cdr') def __init__(self, car, cdr): self.car = car self.cdr = cdr def __str__(self): return '(%s, %s)' % (self.car, self.cdr)
Now let's ask a question: how does the memory use and performance of this compare to just using a Python list (which is not a linked list but instead an array)? I'm going to look purely at building a 1,000 element list element-by-element and I'm going to allow each implementation to append in whatever order is fastest for it. The code:
from itertools import repeatdef native(size): l = [] for _ in repeat(None, size): l.append(0) return l def conslst(size): h = Cons(0, None) for _ in repeat(None, size): h = Cons(0, h) return h
(itertools.repeat
is the fastest way to do this loop.)
On a 32-bit machine the 1,000 element native list takes 4,512 bytes. A
Cons cell takes 28 bytes (not counting the size of what it points to for
either car
or cdr
) and so 1,000 of them takes 28,000 bytes, a factor
of six or so worse than the list.
As for timings, the Cons-based list construction for a thousand elements is about a factor of five worse than Python native lists on my test machine (if I have GC running). Creating the Cons objects appears to be very cheap and what matters for the runtime is all of the manipulation that goes on around them. Creating shorter lists is somewhat better, creating longer ones is worse.
(Since I checked just to be sure, I can tell you that a version of
Cons
that doesn't use __slots__
runs somewhat more slowly.)
Right now some of my readers are rolling their eyes and telling me that
of course the Cons
version is worse and always was going to be worse;
that's why everyone uses native Python lists. That is sort of the point
of this exercise, but that's another entry.
Sidebar: how to make the Cons
version go fast
The short answer is to use PyPy. Run under PyPy, conslst()
is clearly somewhat faster than native()
, even with larger
lists. Both versions run drastically faster than the plain Python
version (which is what you'd hope for, since both ought to be highly
optimizable). Unfortunately there are plenty of environments that are
not PyPy-enabled and probably won't be for some time (for example, all
embedded web server environments like mod_wsgi and UWSGI).