Wandering Thoughts archives

2012-01-27

Why metaclasses work in Python

I've covered what you can do with metaclasses (1, 2, 3, 4) and even, sort of, the low level details of how they work (1, 2, 3). But I've never covered the high level view of why metaclasses work, ie what overall Python features make them go (partly because I am so immersed in Python arcana that much of that stuff feels obvious to me, although I doubt it actually is).

To start with, in Python everything is an object and all objects are an instance of something (yes, there are spots where this gets recursive). This includes even things that you wouldn't normally think of as objects, such as functions. Crucially, this includes classes: classes are objects. Any time you have an object in Python, a lot of its behavior is usually provided by whatever it is an instance of (to avoid confusion, I'll call this the type of the object). Classes are no exception to this; a lot of how classes behave is handled by their type, even things like how a new object gets created when you call the class.

(For simplicity, I'm going to ignore old-style Python 1.x classes from here onwards and assume that all classes are new-style Python 2 classes that ultimately subclass object.)

To avoid a point of confusion: classes have ancestor ('base') classes that they inherit from (or just object(), the root class). However, classes are not instances of their base class; we can see why this has to be when we note that a class can inherit from multiple base classes. You can't be an instance of several different things at once. So classes exist in a two-dimensional relationship; they inherit from one or more base classes, and at the same time they are instances of something that provides much of their 'class' behavior. The type of classes (the thing that provides the 'class' behavior) is called type().

(This two dimensional structure can get a bit weird.)

In some languages, the creation of classes is black magic that happens deep in the interpreter and isn't something you can do inside the language (even if the classes are visible as objects). Python has instead chosen to expose the ability to create classes by hand; you can do this by calling type() with the right arguments (and then binding the class object to a name), just as you create instances of normal classes by calling the class itself. As part of creating classes yourself by hand, you can obviously manipulate class creation; you can create a new class with whatever methods, base classes, and so on you want.

(What's odd about type() is that despite it being a class, you can call it with a single object to get the type of the object.)

Python is also an unusual language in another way; in Python, things like defining functions and classes are themselves executable statements. Python doesn't parse your program, create all the functions and classes, and then start running your code; instead it starts running your code and things like def and class execute on the fly (as does import and so on). So it's natural to have your code running as classes are being created.

The combination of these two things means that Python can easily provide a way to hook your own code into the process of creating the class objects for classes that are written in straight Python, with 'class X(object): ....'. Python is already running code in general when this happens, and the mechanisms of creating classes by hand means it's relatively easy for Python to hand you the bits of the class-to-be so you can modify it and then have everything continue onwards to create a new class. This is why metaclasses can change classes as they are being created.

The other half of why metaclasses work is that Python allows classes to be instances of something other than type(). Since classes get a lot of their 'class' behavior through normal instance method inheritance from type(), a class being an instance of something other than type() lets the other thing intercept or change the normal as-a-class behavior for that class (for example, what happens when you call the class). This is why metaclasses can do things with a class after the class has been created.

WhyMetaclassesWork written at 00:39:39; Add Comment

2012-01-16

Understanding isinstance() on Python classes

Suppose that you have:

class A(object):
  pass

class B(A):
  pass

As previously mentioned, the type of classes is type, which is to say that class objects are instances of type:

>>> isinstance(A, type)
True
>>> isinstance(B, type)
True

Both A and B are clearly subclasses of object; A is a direct subclass and B is indirectly a subclass through A. In fact every new-style Python class is a subclass of object, since object is the root of the class inheritance tree. However, class type is not the same as class inheritance:

>>> issubclass(B, A)
True
>>> isinstance(B, A)
False

Although B is a subclass of A, it is not an instance of A; it is a direct instance of type (we can see this with 'type(B)'). Now, given that A and B are instances of type, one might expect that they would not be instances of object since they merely inherit from it, as B inherits from A:

>>> isinstance(A, object)
True

Well, how about that. We're wrong (well, I'm wrong, you may already have known the correct answer). Here is why:

>>> issubclass(type, object)
True

A and B are instances of type and, like all other classes and types, type is a subclass of object. So A and B are also instances of object (at least in an abstract, Python level view of things), in the same way that an instance of B would also be an instance of A.

I believe that this implies that 'isinstance(X, object)' is always true for anything involved in the new-style Python object system. The corollary is that this is an (almost) surefire test to see if the random object you are dealing with is an old style class or an instance of one:

class C:
  pass

>>> issubclass(C, object)
False
>>> isinstance(C, object)
False

(This goes away in Python 3, where there is only new-style classes and there is much rejoicing, along with people no longer having to explicitly inherit from object for everything.)

PS: as originally noted by Peter Donis on a comment here, object is also an instance of type because object is itself a class. type is an instance of itself in addition to being a subclass of object. Try not to think about the recursion too much.

(This isinstance() surprise is an easy thing to get wrong, which is why I'm writing it down; I almost made this mistake in another entry I'm working on.)

Sidebar: isinstance() and metaclasses

If A (or B) has a metaclass, it is an instance of the metaclass instead of a direct instance of type. In any sane Python program, 'isinstance(A, type)' will continue to be True because A's metaclass will itself be a subclass of type.

(I'm not even sure it's possible to create a working metaclass class that doesn't directly or indirectly subclass type (cf), but I'm not going to bet against it.)

This implies that I was dead wrong when I said, back in ClassesAndTypes, that 'type(type(obj))' would always be 'type' for any arbitrary Python object, as Daniel Martin noted at the time and I never acknowledged (my bad). In the presence of metaclasses, type(type(obj)) can be the metaclass instead of type itself. Since metaclasses can themselves have metaclasses, so there is no guarantee that any fixed number of type() invocations will wind up at type.

ClassesAndIsinstance written at 22:32:55; Add Comment

2012-01-02

An example sort that needs a comparison function

In reaction to my entry on Python 3 dropping the comparison function for sorting, some people may feel that a sorting order that is neither simple field-based nor based on a computed 'distance' (the two cases easily handled by a key function) is unrealistic. As it happens I can give you a great example of a sort order that cannot be handled in any other way: software package versions on Linux systems.

For simplicity (and because I know RPM best), I'm going to talk about RPM-based version numbers. RPM version numbers have three components, an epoch, a version, and a release, and ordering is based on comparing each successive component in turn. The epoch is a simple numeric comparison (higher epochs are more recent), but both the version and release can have sub-components and each sub-component must be compared piecewise using a relatively complex comparison for each piece (they can be all digits, letters, or mixed letters and digits). Something with extra sub-components is more recent than something without it, so version 1.6.1 is more recent than version 1.6. A full package version can look like '1:2.4.6-4.fc16.cks.0'; '1:' denotes the epoch, the version is '2.4.6', and the release is '4.fc16.cks.0'.

(Most RPM packages have an epoch of '1' '0', which is conventionally omitted when reporting package versions.)

In the presence of potential letter-based subcomponents and the complex comparison rules, you can't compare these version numbers using simple field-based rules, not even if you split sub-components up into tuples and then compare a tuple-of-tuples (it's possible if all sub-components are simple numbers). Nor can you compute some sort of single numerical 'distance' value for a particular version number, especially since version numbers are sort of like the rational numbers in that you can always add an essentially unlimited number of additional versions between any two apparently adjacent versions. The only real operation you have is a pure comparison, where you answer the question 'is X a higher version than Y', and this comparison requires relatively intricate code.

(Having said that, DanielMartin showed a nice way to transform things so that a key-function based sort can be used for a comparison function sort in comments on the earlier entry.)

ExampleSortComparison written at 01:49:41; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.