The accumulator mini-pattern and .setdefault()
What I'm going to call the accumulator (mini-)pattern is a common operation when summarizing a stream of data: you have some keys, which can repeat, and you want to accumulate some data each time each key comes up, to count it or sum it all up or keep a list of all of the data for each key.
In pretty much every language that has them, this pattern is done with dictionaries (or hashes or the language's equivalent). In Python, this creates the minor annoyance of initializing a key's entry the first time you see the key, so you wind up with annoying code that looks like this:
store = {} def accum(k, v): if k not in store: store[k] = [] store[k].append(v)
(In awk
, one of the Unix progenitors of this pattern, unset elements
default to zero so you can usually write just 'store[$1] = store[$1] +
$2
' or the like.)
There's a number of variations on the same basic idea; I have seen
people write 'store[k] = store.get(k, 0) + v
', for example. Which
one you settle on depends partly on what operation you're doing (a
default-value .get()
is convenient for math, an 'if not in, add it'
bit of code is convenient for data structures) and partly on which
particular idiom feels natural to you.
For the 'if not in, add it' case one can often use the dict
.setdefault()
method to shorten the code:
def accum(k, v): store.setdefault(k, []).append(v)
(Opinions may be divided on whether this is uglier and more complicated in practice than the more verbose version.)
As it happens, I have to remind myself of .setdefault()
every so
often, and I've seen other people miss it too. I'm not sure why
.setdefault()
keeps slipping out of my mind; it may partly be because
it has such an odd name for the operation it does, although I have to
admit that coming up with a better one would be a challenge.
There is at least one case where .setdefault()
is clearly worse.
Consider:
def accum(k, v): if k not in store: store[k] = SomeClass() store[k].op(v)
If you wrote this with .setdefault()
, you would be creating and then
throwing away a SomeClass
object every time the key had already
been seen before, churning memory in the
process. The more verbose code avoids this by only creating SomeClass
objects when you actually need them.
|
|