Simple versus complex marshalling in Python (and benchmarks)
If you have an external caching layer in your Python application, any caching layer, one of the important things that dictates its speed is how fast you can turn Python data structures into byte blobs, stuff them into the cache, and then get byte blobs back from the cache and turn them back into data structures. Many caches will store arbitrary blobs for you so your choice of marshalling protocols (and code) can make a meaningful difference. And there are a lot of potential options; marshal, cPickle, JSON, Google protobuf, msgpack, and so on.
One of the big divisions here is what I could call the JSON verus pickle split, namely whether you can encode and decode something close to full Python objects or whether you can only encode and decode primitive types. All else being equal it seems like you should use simple marshalling, since creating an actual Python class instance necessarily has some overhead over and above just decoding primitive types. But this leaves you with a question; put simply, how is your program going to manipulate the demarshalled entities?
In many Python programs these entities would normally be objects, partly because objects are the natural primitive of Python (among other reasons, classes provide convenient namespaces). This basically leaves you with two options. If you work with objects but convert them to and from simple types around the cache layer, you've really built your own two-stage complex marshalling system. If you work with simple entities throughout your code you're probably going to wind up with more awkward and un-Pythonic code. In many situations what I think you'll really wind up doing is converting those simple cache entities back to objects at some point (and converting from objects to simple cache entities when writing cache entries).
Which brings me around to the subject of benchmarks. You can find a certain amount of marshalling benchmarks out there on the Internet, but what I've noticed is that they're basically all benchmarking the simple marshalling case. This is perfectly understandable (since many marshalling protocols can only do primitive types) but not quite as useful for me as it looks. As suggested above, what I really want to get into and out of the cache in the long run is some form of objects, whether the marshalling layer handles them for me or I have to do the conversion by hand. The benchmark that matters for me is the total time starting from or finishing with the object.
With that said, if caches are going to be an important part of your
system it likely pays to think about how you're going to get entries
into and out of them efficiently. You may want to have deliberately
simplified objects near the cache boundaries that are mostly thin
wrappers around primitive types. Plus Python gives you a certain
amount of brute force hacks, like playing games with
(I don't have any answers here, or benchmark results for that matter. And I'm sure there's situations where it makes sense to go with just primitive types and more awkward code instead of using Python objects.)
Sidebar: The other marshalling benchmark problem
Put simply, different primitive types generally encode and decode at different speeds (and the same is true for different sizes of primitive types like strings) This means you need to pay attention to what people are encoding and decoding, not just what the speed results are; if they're not encoding something representative of what you want to, all bets may be off.
(My old tests of marshal versus cPickle showed some interesting type-based variations of this nature.)
You can also care more about decoding speed than encoding speed, or vice versa. My gut instinct is that you probably want to care more about decoding speed if your cache is doing much good, because getting things back from the cache (and the subsequent decodes) should be more frequent than putting things into it.