2013-01-03
An alterate pattern for polymorphism in C
As I mentioned in yesterday's entry, CPython
(the C-based main implementation of Python) uses an interesting variant
on struct
-at-start based polymorphism. To put it simply, it uses
#define
s instead of a struct
. This probably sounds odd, so let me
show you the slightly simplified CPython 2.7.x code:
#define PyObject_HEAD \ Py_ssize_t ob_refcnt; \ struct _typeobject *ob_type; #define PyObject_VAR_HEAD \ PyObject_HEAD \ Py_ssize_t ob_size; typedef struct _object { PyObject_HEAD } PyObject; typedef struct { PyObject_VAR_HEAD } PyVarObject; /* A typical actual Python object */ typedef struct { PyObject_VAR_HEAD int ob_exports; Py_ssize_t ob_alloc; char *ob_bytes; } PyByteArrayObject;
(This is taken from Include/object.h
in the CPython source.)
The #define
s are used to construct generic 'object' struct
s (the
typedef'd PyObject and PyVarObject) for use in appropriate code, but
in actual Python objects the #define
s are used directly instead of
the object struct
s being embedded in them. Things are cast back
and forth as necessary; in practice (and I believe perhaps in ANSI C
theory) it's guaranteed that the actual memory layout of the start of a
PyByteArrayObject and a PyVarObject are the same.
There are a number of advantages of this #define
-based approach. The
one that's visible here is that references to these polymorphic fields
in actual struct
s do not require levels and levels of indirection
through names that exist merely as containers. If p
is a pointer
to a PyByteArrayObject, you can directly refer to p->ob_refcnt
instead of having to refer to p->b.a.ob_refcnt
, where b
and a
are arbitrary names assigned to the PyVarObject and PyObject struct
s
embedded in the PyByteArrayObject. This goes well with CPP macros to
manipulate the various fields (actual functions, even inline ones,
would require some actual casting). In particular it means that a CPP
macro to manipulate ob_refcnt
don't have to care whether you're
dealing with a PyObject or a PyVarObject; with explicit structs, the
former case would need p->a.ob_refcnt
while the latter would need
p->b.a.ob_refcnt
.
(Some C compilers allow anonymous struct
s if the members are unique
and this is now standardized in C11.)