Reference counting and multiple inheritance in (C)Python
I recently stumbled over this comment on a LWN article about object oriented design patterns in the Linux kernel. Quoting a bit from the original article, Auders asked:
Though it seems obvious when put this way, it is useful to remember that a single object cannot have two reference counters - at least not two lifetime reference counters [...]. This means that multiple inheritance in the "data inheritance" style is not possible.
The standard CPython implementation of Python uses reference counting, and yet it supports multiple inheritance. How is this possible?
There are two answers. (I'm going to try to write this from the perspective of a C programmer.)
What causes this inheritance problem in C is because of how most C code handles data inheritance. As covered in the LWN series, typically you do inheritance by direct structure embedding; to inherit from struct A, you put struct A in your own struct. This means that your struct's lifetime is tied to the lifetime of the embedded struct A; when A's reference count goes to zero, you will be told to delete your entire structure. If you embed both struct A and struct B, each of them separately reference counted, then A can have its reference count go to zero before B and you will be told to delete your entire structure even though the embedded B is still alive and cannot be deleted.
Python does not do data inheritance by directly embedding structures. Instead each object has a single storage for all fields, regardless of where they come from, and all classes that you inherit from write their fields into it. This is part of what enables Python objects to have a single reference count which is manipulated by everything that takes or releases a reference to the object, regardless of which class's code and data it is working through. This works at the C level because all CPython objects start with a common structure that can be manipulated generically without needing to know what sort of object you're dealing with.
(You could do a single object-wide reference count in C if you wanted to, but it requires extra overhead and only makes sense if your struct A and struct B are purely virtual and must always be subclassed. Instead of directly embedding a reference count in each structure, you'd embed a pointer to the overall object's reference count (and set this in each embedded struct when creating your overall object). You also need to think about whether it does damage to have a dangling struct A that is not referenced from anywhere, because this is what happens when you drop the last reference to A before the overall object can be deleted.)
The other answer is that CPython also has this limit on multiple
inheritance but it's more carefully disguised. Because CPython cannot
do (C) structure embedding, it simply refuses to let a Python class
inherit from two different C-level classes, or in fact from two classes
(Python or C-level) that have incompatible object layouts at the C
level. Typical Python programs never notice because almost all (Python)
classes only inherit from a single C-level class, that being
You can safely do multiple inheritance through several paths to the same
C-level root class because of how all those instance fields from your
Python classes get stuck into the object's generic field storage.
(C-level classes effectively do not inherit from each other.)
Attempting to break this constraint gets you a series of odd error messages depending on what exactly you're trying to do. I've written about various specific manifestations of this before, such as here and here.
(Both answers are true, but the first answer is incomplete.)
One of my testing little dirty secrets
I recently read yet another article on TDD, and one of the things this article talked about was the benefit of reading tests in order to understand what the code was doing.
When I thought about someone doing this to my tests, I laughed hollowly.
One of the little dirty secrets about the tests I write is that they are, well, slapped together. I almost invariably write tests in the most expedient and brute force way, and I don't particularly write comments about what the tests are testing and how. (Sometimes I will write a one sentence summary of what bit of the API a test is testing, mostly because Python's unittest module encourages this.)
This goes well beyond how I'd rather have clean code and dirty tests. A good part of it is that I still have the mindset that tests are overhead, and the less time I spend on them the more time I have to spend on writing the useful things. Writing clean, carefully commented test code would require a lot more work. A part of it is that most test code is about the most boring, straightforward code that you could imagine; 'repeatedly call this routine with certain inputs and verify that you get certain outputs' is essentially boilerplate, except you can't automate it.
(I have a tendency to make my tests somewhat exhaustive. I'm not content to test that a routine works with one set of inputs, so I want to call anything important with all sorts of arguments to check basic functionality, boundary conditions, and so on.)
When I was essentially developing code for myself, this was sort of acceptable (although not great, since even I can forget what my tests were about if I was away from the code for a while). But since I've been developing code in a more shared environment I've become increasingly conscious of how hard it would be for my co-workers to understand my tests as part of developing a change to my code. Although I'm not certain what the right answer is, I suspect that it is adding more comments and more careful code structure to my tests, even though this is sure to make them slower and more annoying to write (and to revise).
(Probably I should look at how some real projects in the wild structure and document their tests, to see how people who really know TDD deal with this problem.)