Reference counting and multiple inheritance in (C)Python

July 31, 2011

I recently stumbled over this comment on a LWN article about object oriented design patterns in the Linux kernel. Quoting a bit from the original article, Auders asked:

Though it seems obvious when put this way, it is useful to remember that a single object cannot have two reference counters - at least not two lifetime reference counters [...]. This means that multiple inheritance in the "data inheritance" style is not possible.

The standard CPython implementation of Python uses reference counting, and yet it supports multiple inheritance. How is this possible?

There are two answers. (I'm going to try to write this from the perspective of a C programmer.)

What causes this inheritance problem in C is because of how most C code handles data inheritance. As covered in the LWN series, typically you do inheritance by direct structure embedding; to inherit from struct A, you put struct A in your own struct. This means that your struct's lifetime is tied to the lifetime of the embedded struct A; when A's reference count goes to zero, you will be told to delete your entire structure. If you embed both struct A and struct B, each of them separately reference counted, then A can have its reference count go to zero before B and you will be told to delete your entire structure even though the embedded B is still alive and cannot be deleted.

Python does not do data inheritance by directly embedding structures. Instead each object has a single storage for all fields, regardless of where they come from, and all classes that you inherit from write their fields into it. This is part of what enables Python objects to have a single reference count which is manipulated by everything that takes or releases a reference to the object, regardless of which class's code and data it is working through. This works at the C level because all CPython objects start with a common structure that can be manipulated generically without needing to know what sort of object you're dealing with.

(You could do a single object-wide reference count in C if you wanted to, but it requires extra overhead and only makes sense if your struct A and struct B are purely virtual and must always be subclassed. Instead of directly embedding a reference count in each structure, you'd embed a pointer to the overall object's reference count (and set this in each embedded struct when creating your overall object). You also need to think about whether it does damage to have a dangling struct A that is not referenced from anywhere, because this is what happens when you drop the last reference to A before the overall object can be deleted.)

The other answer is that CPython also has this limit on multiple inheritance but it's more carefully disguised. Because CPython cannot do (C) structure embedding, it simply refuses to let a Python class inherit from two different C-level classes, or in fact from two classes (Python or C-level) that have incompatible object layouts at the C level. Typical Python programs never notice because almost all (Python) classes only inherit from a single C-level class, that being object(). You can safely do multiple inheritance through several paths to the same C-level root class because of how all those instance fields from your Python classes get stuck into the object's generic field storage.

(C-level classes effectively do not inherit from each other.)

Attempting to break this constraint gets you a series of odd error messages depending on what exactly you're trying to do. I've written about various specific manifestations of this before, such as here and here.

(Both answers are true, but the first answer is incomplete.)

Written on 31 July 2011.
« One of my testing little dirty secrets
How to make yourself look bad: broken bounce addresses »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jul 31 21:47:42 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.