How CPython implements __slots__ (part 1): storage

April 21, 2011

At an abstract level, each instance of a conventional class has a __dict__ member that is a conventional Python dictionary, and instance attributes are created and manipulated by manipulating this dictionary; the dictionary key is the attribute name and the value is the attribute's value. __slots__ eliminates this dictionary and instead has a fixed list of attributes that instances of the class know about. All of this is in the documentation. What the documentation won't tell you is how the machine level storage for all of this actually works. That's what today's entry is about.

In CPython, class instances start out as a more or less opaque C structure that is specific to the C-level type that your class inherits from (we saw this before). However, the general CPython type infrastructure for new-style classes reserves the right to add some extra space on the end of your type's opaque blob for its own purposes. If your class has a __slots__, this code adds some extra space after the C structure blob to store what is effectively an array of pointers to Python objects. These entries are used to point to the values of each __slots__ attribute (if there is no value set, the corresponding entry is NULL and the CPython code reacts appropriately).

While somewhat complicated, this approach minimizes the memory overhead for class instances. If you allocated the array of slot value pointers separately, you would have a second memory allocation and you'd need an extra pointer in the base object structure to point to the separate array. And because all instances of the class have exactly the same slots, you can put all information on the names of slots and how to access them on the class, instead of having to have it also attached on the instance.

If you have a class that both has a non-empty __slots__ and tries to inherit from certain built in types, you will get the error:

nonempty __slots__ not supported for subtype of '<type>'

The Python documentation mentions this but does not explain the details of what is going on, which have to do with this storage approach.

Most C-level types have a fixed size C structure; however, the type infrastructure has general support for types that have a fixed size header structure plus some number of (fixed size) items immediately after the header. Because the information on how to access slot values is attached to the class, not the instance, the CPython code requires that all slot value pointers have a constant offset from the start of the instance object. This requires that all instance objects for a type have the same fixed size, which is not the case for instances of 'base + items' C-level types. Hence the message you get here.

You can still have an empty __slots__ even for 'base + items' types, because this doesn't require allocating any slot value pointers; it just turns off the creation of the __dict__ dictionary.

(Well, usually.)

Sidebar: how __dict__ itself is (usually) implemented

One might innocently think that __dict__ would be implemented by having something like an ob_dict pointer in the basic Python C-level object structure. As it happens, CPython is both more clever and more sleazy than this. The storage for the pointer to the __dict__ dictionary is actually usually created through this same 'add things on the end of the type's blob' code, and the C structure for the type itself has a field that says what offset this pointer is to be found at. This saves a pointer when __slots__ turns off __dict__ and probably has other implementation advantages that I don't know about.

You might wonder how this works for base + items types. That's where the sleaze comes in: CPython has special magic support to make this work for the __dict__ offset. If I'm reading the code right, it switches to indexing the offset from the end of the object instead of the start.

(If you want the gory details, see _PyObject_GetDictPtr in Objects/object.c in the CPython source code.)

If you want to see some of this sausage's insides, look at the __dictoffset__ attribute of any new-style class. For bonus points, create a class that inherits from, say, str and then look at its __dictoffset__. Note that almost all built-in types will show a 0 for this value for reasons that do not fit into this sidebar.

Written on 21 April 2011.
« Another reason to avoid using __slots__ in your Python classes
Nailing down new-style classes and types in Python »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Apr 21 01:12:01 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.