How much space a Python dictionary entry uses

June 6, 2011

There are a number of built in Python types that have variable sized instances, such as tuples, lists, and longs (or ints, in Python 3). For some of these, there is a simple answer to how much space a given instance will take up, based on how many items you put in an instance; you can determine this from using sys.getsizeof() on an empty instance to get the base size and then looking at <type>.__itemsize__ to see the per-item space usage.

Dictionaries are not this simple, because like lists they do not simply add their entries on the end of their basic C level blob of data (well, it's a C structure). As a result a number of things affect the size of a particular dictionary, and in turn affect when adding entries will actually make the dictionary use more memory.

In the abstract, a dictionary entry is three pointers; one for the key, one for the value, and a third to store the value of the key's hash. On a 32-bit machine this makes it 12 bytes and on a 64-bit machine, 24 bytes. However, dictionaries do not add entries one by one; instead there are two complications.

First, the basic dictionary structure comes with enough room for up to five entries. If you watch sys.getsizeof() as you add keys to a dictionary, you can see this clearly; an empty dictionary and one with five entries use the same modest amount of space, but once you add a sixth entry the space usage jumps.

Second, dictionaries enlarge (and shrink) in discontinuous jumps. Each dictionary has a table for entries, which is always a power of two in size and at most 2/3rds used; when you add an entry that would push the table over 2/3rds used, CPython enlarges the dictionary table instead. Dictionary tables that are not that large quadruple when they're resized; large tables merely double.

Let's put this in concrete terms. On a 64-bit machine with Python 2.7, sys.getsizeof() reports that a dictionary uses 280 bytes when it has from zero to five elements. When you add a sixth entry, the reported size jumps to 1048 bytes. We can actually work out exactly where this memory is going; the size of the dictionary's C structure is 280 bytes, and then we need a 32-entry table (four times the size of the 8-entry table we started with) where every entry is 24 bytes, for 768 bytes total. This adds up to exactly the 1048 bytes of memory usage that was reported.

Note that all of this is of course implementation dependent, although the CPython dictionary implementation has been refined enough that it doesn't seem to change much. Python 3.1.2 reports the same numbers, for example. The real lesson here is that sys.getsizeof() means that you can measure things for yourself if you think it matters in your program.

(Also note that size reported by sys.getsizeof() does not include the space required for the keys and the values themselves, for various reasons.)

Written on 06 June 2011.
« A subtle difference between tuples and lists in CPython
Databases as a compromise damage limiter in web applications »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jun 6 00:46:19 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.