The Python marshal module versus the cPickle module
The marshal module looks interesting for persisting and retrieving lightweight data, but the big question to me has always been whether in exchange for constraining your data down to simple structures of primitive types you got something that was actually faster than the cPickle module.
So today I decided to finally answer the question by doing some timing tests. I won't claim that these are comprehensive or entirely scientific, but I do have some results:
- the speed difference is mostly in dumping things; marshal and
cPickle generally load things as fast as each other (sometimes
cPickle has the edge, sometimes marshal).
- marshal is significantly faster on nested dictionaries.
- marshal is generally a bit faster than cPickle.
- however, marshal really suffers on long strings, and it gets worse the longer the string is; for example, it dumps 2048 byte strings ten times slower than cPickle does.
- neither marshal nor cPickle are very good at Unicode strings.
cPickle suffers the worse relative slowdown, especially for
loading; it becomes 18 times slower on my sample string, although
this is still no slower than marshal's time.
(This is one of the rare cases when cPickle dumps much faster than it loads.)
- marshal is significantly worse on floating point numbers, especially a list of them, although not as badly as on strings (it's only about twice as slow as cPickle for a single floating point number).
Since DWiki's cache layer spends a lot of time writing and reading long strings, it looks like I made the right decision way back when. The combination of long strings and floating point numbers meant that marshal was significantly slower than cPickle for a simulated DWiki cache object.
The general parity in loading time suggests that even for simple data structures, for a caching layer you might as well use cPickle; you are not particularly slower for the thing you're going to be doing a lot, and you get a bunch of (potential) benefits in return.