How much space a Python dictionary entry uses
There are a number of built in Python types that have variable sized
instances, such as tuples, lists, and longs (or ints, in Python 3).
For some of these, there is a simple answer to how much space a given
instance will take up, based on how many items you put in an instance;
you can determine this from using sys.getsizeof()
on an empty instance
to get the base size and then looking at <type>.__itemsize__
to see
the per-item space usage.
Dictionaries are not this simple, because like lists they do not simply add their entries on the end of their basic C level blob of data (well, it's a C structure). As a result a number of things affect the size of a particular dictionary, and in turn affect when adding entries will actually make the dictionary use more memory.
In the abstract, a dictionary entry is three pointers; one for the key, one for the value, and a third to store the value of the key's hash. On a 32-bit machine this makes it 12 bytes and on a 64-bit machine, 24 bytes. However, dictionaries do not add entries one by one; instead there are two complications.
First, the basic dictionary structure comes with enough room for up
to five entries. If you watch sys.getsizeof()
as you add keys to a
dictionary, you can see this clearly; an empty dictionary and one with
five entries use the same modest amount of space, but once you add a
sixth entry the space usage jumps.
Second, dictionaries enlarge (and shrink) in discontinuous jumps. Each dictionary has a table for entries, which is always a power of two in size and at most 2/3rds used; when you add an entry that would push the table over 2/3rds used, CPython enlarges the dictionary table instead. Dictionary tables that are not that large quadruple when they're resized; large tables merely double.
Let's put this in concrete terms. On a 64-bit machine with Python 2.7,
sys.getsizeof()
reports that a dictionary uses 280 bytes when it has
from zero to five elements. When you add a sixth entry, the reported
size jumps to 1048 bytes. We can actually work out exactly where this
memory is going; the size of the dictionary's C structure is 280 bytes,
and then we need a 32-entry table (four times the size of the 8-entry
table we started with) where every entry is 24 bytes, for 768 bytes
total. This adds up to exactly the 1048 bytes of memory usage that was
reported.
Note that all of this is of course implementation dependent, although
the CPython dictionary implementation has been refined enough that it
doesn't seem to change much. Python 3.1.2 reports the same numbers, for
example. The real lesson here is that sys.getsizeof()
means that you
can measure things for yourself if you think it matters in your program.
(Also note that size reported by sys.getsizeof()
does not include
the space required for the keys and the values themselves, for various
reasons.)
|
|