The Python marshal module versus the cPickle module

October 18, 2007

The marshal module looks interesting for persisting and retrieving lightweight data, but the big question to me has always been whether in exchange for constraining your data down to simple structures of primitive types you got something that was actually faster than the cPickle module.

So today I decided to finally answer the question by doing some timing tests. I won't claim that these are comprehensive or entirely scientific, but I do have some results:

  • the speed difference is mostly in dumping things; marshal and cPickle generally load things as fast as each other (sometimes cPickle has the edge, sometimes marshal).

  • marshal is significantly faster on nested dictionaries.
  • marshal is generally a bit faster than cPickle.

  • however, marshal really suffers on long strings, and it gets worse the longer the string is; for example, it dumps 2048 byte strings ten times slower than cPickle does.
  • neither marshal nor cPickle are very good at Unicode strings. cPickle suffers the worse relative slowdown, especially for loading; it becomes 18 times slower on my sample string, although this is still no slower than marshal's time.

    (This is one of the rare cases when cPickle dumps much faster than it loads.)

  • marshal is significantly worse on floating point numbers, especially a list of them, although not as badly as on strings (it's only about twice as slow as cPickle for a single floating point number).

Since DWiki's cache layer spends a lot of time writing and reading long strings, it looks like I made the right decision way back when. The combination of long strings and floating point numbers meant that marshal was significantly slower than cPickle for a simulated DWiki cache object.

The general parity in loading time suggests that even for simple data structures, for a caching layer you might as well use cPickle; you are not particularly slower for the thing you're going to be doing a lot, and you get a bunch of (potential) benefits in return.


Comments on this page:

From 76.100.187.127 at 2007-10-19 09:27:21:

Have you posted the code? I'd be interested in comparing to simplejson.dumps() as well, just for fun.

By cks at 2007-10-20 00:11:39:

I've put the simple and unscientific code I used up here.

Written on 18 October 2007.
« Our experience with Linux's strict overcommit mode
Some notes on booting single user in x86 Solaris 10 »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Oct 18 23:19:41 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.