2012-01-27
Why metaclasses work in Python
I've covered what you can do with metaclasses (1, 2, 3, 4) and even, sort of, the low level details of how they work (1, 2, 3). But I've never covered the high level view of why metaclasses work, ie what overall Python features make them go (partly because I am so immersed in Python arcana that much of that stuff feels obvious to me, although I doubt it actually is).
To start with, in Python everything is an object and all objects are an instance of something (yes, there are spots where this gets recursive). This includes even things that you wouldn't normally think of as objects, such as functions. Crucially, this includes classes: classes are objects. Any time you have an object in Python, a lot of its behavior is usually provided by whatever it is an instance of (to avoid confusion, I'll call this the type of the object). Classes are no exception to this; a lot of how classes behave is handled by their type, even things like how a new object gets created when you call the class.
(For simplicity, I'm going to ignore old-style Python 1.x classes from
here onwards and assume that all classes are new-style Python 2 classes
that ultimately subclass object.)
To avoid a point of confusion: classes have ancestor ('base') classes
that they inherit from (or just object(), the root class). However,
classes are not instances of their base class; we can see why this
has to be when we note that a class can inherit from multiple base
classes. You can't be an instance of several different things at once.
So classes exist in a two-dimensional relationship; they inherit from
one or more base classes, and at the same time they are instances of
something that provides much of their 'class' behavior. The type
of classes (the thing that provides the 'class' behavior) is called
type().
(This two dimensional structure can get a bit weird.)
In some languages, the creation of classes is black magic that happens
deep in the interpreter and isn't something you can do inside the
language (even if the classes are visible as objects). Python has
instead chosen to expose the ability to create classes by hand; you
can do this by calling type() with the right arguments (and then
binding the class object to a name), just as you
create instances of normal classes by calling the class itself. As part
of creating classes yourself by hand, you can obviously manipulate
class creation; you can create a new class with whatever methods, base
classes, and so on you want.
(What's odd about type() is that despite it being a class, you can
call it with a single object to get the type of the object.)
Python is also an unusual language in another way; in Python, things
like defining functions and classes are themselves executable
statements. Python doesn't parse your program,
create all the functions and classes, and then start running your code;
instead it starts running your code and things like def and class
execute on the fly (as does import and so on). So it's natural to have
your code running as classes are being created.
The combination of these two things means that Python can easily provide
a way to hook your own code into the process of creating the class
objects for classes that are written in straight Python, with 'class
X(object): ....'. Python is already running code in general when this
happens, and the mechanisms of creating classes by hand means it's
relatively easy for Python to hand you the bits of the class-to-be so
you can modify it and then have everything continue onwards to create a
new class. This is why metaclasses can change classes as they are being
created.
The other half of why metaclasses work is that Python allows classes to
be instances of something other than type(). Since classes get a lot
of their 'class' behavior through normal instance method inheritance
from type(), a class being an instance of something other than
type() lets the other thing intercept or change the normal as-a-class
behavior for that class (for example, what happens when you call the
class). This is why metaclasses can do things with a class after the
class has been created.
2012-01-16
Understanding isinstance() on Python classes
Suppose that you have:
class A(object): pass class B(A): pass
As previously mentioned, the type of classes is
type, which is to say that class objects are instances of type:
>>> isinstance(A, type) True >>> isinstance(B, type) True
Both A and B are clearly subclasses of object; A is a direct subclass
and B is indirectly a subclass through A. In fact every new-style Python
class is a subclass of object, since object is the root of the
class inheritance tree. However, class type is not the same as class
inheritance:
>>> issubclass(B, A) True >>> isinstance(B, A) False
Although B is a subclass of A, it is not an instance of A; it is a
direct instance of type (we can see this with 'type(B)'). Now,
given that A and B are instances of type, one might expect that they
would not be instances of object since they merely inherit from it, as
B inherits from A:
>>> isinstance(A, object) True
Well, how about that. We're wrong (well, I'm wrong, you may already have known the correct answer). Here is why:
>>> issubclass(type, object) True
A and B are instances of type and, like all other classes and types,
type is a subclass of object. So A and B are also instances of
object (at least in an abstract, Python level view of things), in
the same way that an instance of B would also be an instance of A.
I believe that this implies that 'isinstance(X, object)' is always
true for anything involved in the new-style Python object system. The
corollary is that this is an (almost) surefire test to see if the random
object you are dealing with is an old style class or an instance of one:
class C: pass >>> issubclass(C, object) False >>> isinstance(C, object) False
(This goes away in Python 3, where there is only new-style classes
and there is much rejoicing, along with people no longer having to
explicitly inherit from object for everything.)
PS: as originally noted by Peter Donis on a comment here, object is also an instance of type because
object is itself a class. type is an instance of itself in addition
to being a subclass of object. Try not to think about the recursion
too much.
(This isinstance() surprise is an easy thing to get wrong, which is
why I'm writing it down; I almost made this mistake in another entry I'm
working on.)
Sidebar: isinstance() and metaclasses
If A (or B) has a metaclass, it is an instance of the metaclass instead of a direct instance of type. In any
sane Python program, 'isinstance(A, type)' will continue to be True
because A's metaclass will itself be a subclass of type.
(I'm not even sure it's possible to create a working metaclass
class that doesn't directly or indirectly subclass type (cf), but I'm not going to bet against it.)
This implies that I was dead wrong when I said, back in ClassesAndTypes,
that 'type(type(obj))' would always be 'type' for any arbitrary
Python object, as Daniel Martin noted at the time and I never
acknowledged (my bad). In the presence of metaclasses, type(type(obj))
can be the metaclass instead of type itself. Since metaclasses can
themselves have metaclasses, so there is no guarantee that any fixed
number of type() invocations will wind up at type.
2012-01-02
An example sort that needs a comparison function
In reaction to my entry on Python 3 dropping the comparison function for sorting, some people may feel that a sorting order that is neither simple field-based nor based on a computed 'distance' (the two cases easily handled by a key function) is unrealistic. As it happens I can give you a great example of a sort order that cannot be handled in any other way: software package versions on Linux systems.
For simplicity (and because I know RPM best), I'm going to talk about RPM-based version numbers. RPM version numbers have three components, an epoch, a version, and a release, and ordering is based on comparing each successive component in turn. The epoch is a simple numeric comparison (higher epochs are more recent), but both the version and release can have sub-components and each sub-component must be compared piecewise using a relatively complex comparison for each piece (they can be all digits, letters, or mixed letters and digits). Something with extra sub-components is more recent than something without it, so version 1.6.1 is more recent than version 1.6. A full package version can look like '1:2.4.6-4.fc16.cks.0'; '1:' denotes the epoch, the version is '2.4.6', and the release is '4.fc16.cks.0'.
(Most RPM packages have an epoch of '1' '0', which is
conventionally omitted when reporting package versions.)
In the presence of potential letter-based subcomponents and the complex comparison rules, you can't compare these version numbers using simple field-based rules, not even if you split sub-components up into tuples and then compare a tuple-of-tuples (it's possible if all sub-components are simple numbers). Nor can you compute some sort of single numerical 'distance' value for a particular version number, especially since version numbers are sort of like the rational numbers in that you can always add an essentially unlimited number of additional versions between any two apparently adjacent versions. The only real operation you have is a pure comparison, where you answer the question 'is X a higher version than Y', and this comparison requires relatively intricate code.
(Having said that, DanielMartin showed a nice way to transform things so that a key-function based sort can be used for a comparison function sort in comments on the earlier entry.)