2011-09-23
Python's attribute lookup order
One of the things that surprised me recently is that the Python
language reference doesn't seem to
document the specifics of how Python searches for the attribute when you
try to access obj.attr. So here is a not completely formal description
of the process for Python 2.7 or so for new-style classes.
(Such a description really belongs in the language reference because it's an important part of Python's semantics. It's possible that you can reassemble a full description of the lookup semantics from the bits and pieces scattered throughout the language reference.)
First, any time Python looks for an attribute on an object it
doesn't just search the object's (nominal) attribute dictionary; it
also searches back through the object's base classes (in method
resolution order), ultimately terminating at
object. Ordinary instances of classes don't have bases (and can't be
given them), but classes do.
Let's say we're looking for attr on the object obj, and typ is
the object's type object (ie type(obj)). Then the lookup order is
more or less:
- if
typ.attrexists and has both a__get__and either a__set__or a__delete__attribute, it's a data descriptor; we call it and return the result.(Data descriptors thus deliberately preempt attributes on the object itself.)
- search for
obj.attr. On classes (but not on non-class instances) this may be a descriptor with a__get__instead of a plain attribute; if it is, it's called.(Classes evaluating their own descriptors is necessary in order to make various descriptor and property things work when you directly access the class. Consider
cls.classmethod(), for example.) - search for
typ.attr. This may be a plain attribute or a non-data descriptor with a__get__; if it's a descriptor, it's called.
(As you might guess, the code does not search for typ.attr twice;
it saves the result of the step one search for use in step three.)
Note that Python never looks at the type of typ; one level into the
type hierarchy is as far as it goes. By contrast it will go as far along
the base class inheritance hierarchy as it needs to, all the way to
object if necessary.
(This explains how metaclasses can be used to add class-only
attributes on classes. instance.attr will search
only the instance and cls, its class, but not go the next step to
type(cls), the metaclass. cls.attr will search cls and its base
classes (in step 2) and then the metaclass (in step 3).)
It's important here to not confuse an object's base classes with its
type or its type's base classes. I tend to think of base classes and
types as creating a two dimensional structure, where inheritance goes
sideways through a class's base classes to end up at object while
its type goes up to end up at type. Although we often informally
talk about instances of an ordinary class inheriting from things and
attributes being looked up through them, this is not what is actually
happening. A plain instance has no base classes; it is its type (the
class it is an instance of) that has the base classes. Normally making
this distinction is unimportant, but here it's vital.
(The confusing case is a metaclass, which has type as both its base
class (since metaclasses subclass type) and its type (since all
classes, metaclasses included, are normally instances of type).)
Sidebar: where this is done in CPython's code
In CPython, this is done in two separate pieces of code. Lookups
for classes are handled by type's type_getattro(), in
Objects/typeobject.c. Lookups for non-class objects are handled by
object's PyObject_GenericGetAttrWithDict(), in Objects/object.c
(which is called with no dictionary argument in this case).
2011-09-22
An operational explanation of Python types
In the world of new style classes, everything in Python has a type
(visible with 'type(thing)'); class instances have a type, classes
have a type, even type() has itself as a type.
Sure, fine, but what does it mean in Python to have a type? What do
types actually do?
Given part 4 of what you can do with metaclasses, we can now see the answer: an object's type is where Python finds special methods for it.
Python does not look for special methods through the
normal attribute lookup process (where it would effectively look up
obj.__str__ if you did str(obj)); instead it goes straight to the
type (doing the equivalent of type(obj).__str__).
Here, let's have an example:
class Example(object):
def __str__(self):
return "class"
ei = Example()
ei.__str__ = lambda x: "lambda"
print ei.__str__(ei), str(ei)
This will print 'lambda class', showing that str() is not using the
version of __str__ on the object itself.
Now we can see why everything in Python has the type that it does.
Instances of a class have a type() of their class so that special
methods are looked up on their class, which lets classes actually
implement all of those special methods. Classes have a type() of their
metaclass (or type if they have no metaclass) so that things like
__call__ get looked up on their metaclass (or on type itself,
which has the standard implementation of various things). Both type
and object have the type of type for the same reason (well, in
theory).
(My example above was contrived because people generally don't try to put special methods on instances. But it's easy to have a clash between a class special method and a metaclass special method, and then this does matter.)
One of the nice aspects of this is that it unifies how Python does
attribute lookup for instances of classes with how it does it with
classes. There is no special magic in the interpreter to treat them
differently; how they behave is just determined by how their type
acts. Much of what looks like fundamental behavior (such as using
'Example()' to create an instance of the class) is in fact simply due
to how object and type act.
Types also get involved in explicit attribute lookups, but that's a much more complicated topic and is not as core to what a type is in Python. (Well, in my opinion.)
2011-09-19
An operational explanation of Python metaclasses (part 4)
The metaclass use of __call__ that I covered in part 2 and the use of __getattribute__ et al that I
mentioned in part 3 are both specific instances of
a general metaclass power: a metaclass supplies the special methods
for its classes. However and as before, these special methods apply
only when you are looking at or manipulating the class itself, not when
you are dealing with instances of the class.
(Special methods for instances of the class have to be supplied by the class or an ancestor it inherits from, as usual.)
Most special methods aren't particularly useful for classes, since you rarely want to do things like treat a class object as a sequence; usually classes just sort of sit there being instantiated and subclassed. The useful special methods are the handful of methods that affect things that you use classes for, which means that part 2 and part 3 have already covered most of them.
(While you can define custom __str__ and __repr__ methods in
a metaclass, note that these methods are not used when printing the
class name portion of an instance of a class; you still get the familiar
'<file.whatever object at 0x...>' result unless you have appropriate
custom methods on the class itself.)
There are two additional sets of special methods that are worth
mention. First, you can control how subclassing and instance checks
work; this is covered in the Python documentation on Customizing
instance and subclass checks,
which explicitly mentions that these methods have to be defined in
a metaclass. Second, you can use bare classes in with statements
(instead of class instances) by implementing the context manager
special methods
on your metaclass. It's possible that there's some use for this.
Sidebar: why your metaclass must descend from type
Armed with this understanding about special methods I can now explain
the reason why metaclasses have to subclass type instead of object
and why you get strange error messages if you don't, as I mentioned in
passing in part 1. We can see the answer by asking
'what is the metaclass of a class without a metaclass?'
The answer is 'type'. Given that your metaclass supplies
implementations of special methods, this means that type supplies the
default versions of things like __call__. If you subclass type
in your metaclass, you inherit all of the necessary default versions
of various special methods. If you don't, well, they aren't going to
come from object; object doesn't supply them. This is also why
you need to call up to type (either via super() or directly) in
order to get various things done in a metaclass; type does all of the
magic necessary to actually create new classes, call them to create new
instances of them, and so on.
(Yes, technically a metaclass can be any callable not just a class. I'm looking only at the 'metaclass as a class' case right now.)
2011-09-18
An operational explanation of Python metaclasses (part 3)
Following on from part 1 and part 2, the third thing that we can do with a metaclass is to create 'class-only attributes', attributes that are only visible on the class and not on instances of the class. In fact we can go further than just adding attributes; we can control what attributes are directly visible on the class without affecting what attributes are visible on instances of the class.
Simply adding attributes to the class (and only to the class) is done by putting attributes on the metaclass; in fact all attributes on the metaclass are visible on the class but not on instances of the class. Since this has some subtle bits, here is an example to illustrate:
class MiniMeta(type): one = "meta" two = "meta" class Alpha(object): two = "alpha" class Beta(Alpha): __metaclass__ = MiniMeta bi = Beta()
Beta.one is "meta", but bi.one is an AttributeError; the attribute
is visible only on the class and is not visible in instances. Beta.two
and bi.two are both "alpha"; the attribute on the parent class
overrides the attribute on the metaclass for both the class and an
instance of the class (the same thing happens if we define two on
Beta itself). Well, mostly.
The exception to parent classes taking priority is properties. A property set on the metaclass overrides anything else when you access it on the class, but is invisible to instances of the class. If you try hard this can be used to create attributes that have one value on the class and another value on instances of the class, which is sure to confuse everyone who reads your code unless you comment it heavily (and maybe even then).
The advanced version of this is that you can get partial or (almost)
full control of attribute access to the class itself by setting
up __getattr__, __getattribute__, __setattr__, and/or
__delattr__ special methods on the metaclass. When people access
attributes on the class itself (eg, as Beta.attr), these work just
as if Beta was an ordinary instance of MiniMeta (because that's
actually exactly what it is). However they are ignored when accessing
attributes of Beta through instances of it, eg as bi.attr; the
metaclass __getattribute__ and so on are not even called.
(As with properties, this can be abused to create attributes which have a different value on the class than on instances of the class.)
Note that this not-looking happens even when the lookup on the class
is implicit, such as when you do len(bi) and Python looks to see
if there's a Beta.__len__ method. In fact special method lookups
don't go through any __getattribute__ or __getattr__ at all
(this is covered in the official documentation).
(As with part 2, this is a specific example of a general metaclass power.)
Sidebar: method functions on the metaclass versus @classmethod
Both of these create 'class methods', method functions that take the
class as the 'self' argument instead of an instance of the class.
However they are not quite the same thing; in particular, @classmethod
methods are visible (and work) from instances of the class while
metaclass methods are not.
My personal opinion is that most of the time you want @classmethod
because far more people are going to understand what you're doing. You
want this even if there's no common ancestor class you can put the
methods on and you have to invent some artificial mixin class to hold
them.
2011-09-17
An operational explanation of Python metaclasses (part 2)
After modifying a class as it's being created (covered in part 1), the next thing you can do with a metaclass is
get a chance to do things when instances of the class are created.
You do this by defining __call__ on your metaclass:
class MiniMeta(type):
def __call__(cls, *args, **kwargs):
return super(MiniMeta, cls).\
__call__(*args, **kwargs)
class Example(object):
__metaclass__ = MiniMeta
There are a number of things that you can do with this. One of them
(which I first saw in another metaclass tutorial) is handling deferred
initialization and setup of something related to the class. Rather than
doing this in your metaclass __new__ or __init__, you defer it
until the first time an instance of the class is created; this saves
you effort in situations where many classes are defined but only a few
classes will ever be used to create instances.
Using a metaclass __call__ to customize instance creation runs into
the obvious question of why you don't just do the same work in the
class's own __init__ method (or in extreme cases, its __new__
method). Probably the right answer is if you have a chunk of common
behavior across a bunch of classes and the classes can't easily be put
into an inheritance relationship where this functionality can be pulled
into a common ancestor or mixin class.
(This use of metaclasses is considered sufficiently interested to
get mentioned in passing in the official documentation for
__metaclass__.)
PS: __call__ is actually a specific example of a general metaclass
power. I'll get to the general power later because it requires more
explanation.
Sidebar: a technicality
Strictly speaking using __call__ does not intercept all instance
creation, since a sufficiently creative person can still obtain new
instances of Example by calling object.__new__() directly.
If you're doing something where you have to worry about this, Python
is probably the wrong language to write your code in.
2011-09-12
An operational explanation of Python metaclasses (part 1)
All of the explanations of metaclasses that I've read have started out by talking about the whole background and theory of operation of metaclasses. This approach doesn't work for me; by the time they get out of the background, I'm either asleep or my eyes have glazed over. So I'm going to tackle metaclasses from the other end, covering what you can do with them.
Part of the reason that metaclasses are complicated and confusing is that they can be used to do a number of mostly unrelated things. So to start out, let's talk about the classical and most common use of metaclasses: modifying a class as it's being created. This is more or less how things like Django's form and model definitions work, and it's what I did in my metaclass for namespaces.
(This is sort of like the kind of things that you can do with Lisp macros, although nowhere near as advanced.)
There are two spots where a metaclass can meddle in the creation of
a class. A metaclass's __new__ is called before the class type
object exists, is expected to return the newly created
class object, and normally works by manipulating the 'class dictionary'
of the class to be. A metaclass's __init__ is called after the class
exists but before it has been completely finalized, and pretty much can
only work by manipulating the new class object.
(This just like __new__ versus __init__ on conventional classes
(cf),
except that the 'object' you are dealing with is a class definition and
the arguments to both functions come in a very specific form.)
Most metaclasses use __new__ instead of __init__. In general,
most sophisticated changes are easier to do in __new__ because you
don't have to worry about normal class magic getting in the way (for
example, a function getting automatically converted to an unbound method
when you try to retrieve it to modify it). In addition, because some
things about a class are frozen at the moment that its class object is
created, changing them can only be done in __new__; the obvious
example is creating, modifying, or removing
__slots__. You can add things to the class in __init__, and it
may be clearer to do so there because you can simply set attributes
directly.
(Properties do not have to be created in __new__ as far as I can
see.)
Also, __new__ is free to return an existing class object. In theory
you could use this to implement 'singleton classes'; in practice, I
can't think of much use of this outside of something like Django, where
the 'classes' are actually a little domain specific language to define
things and where you might want two definitions of the same thing to
result in the same actual class object (especially if you track state
through the class object in the background).
The mechanics
__new__ and __init__ are called slightly differently; the
signatures are:
class MiniMeta(type):
def __new__(meta, cname, bases, cdict):
return type.__new__(meta, cname, bases, \
cdict)
def __init__(cls, cname, bases, cdict):
return type.__init__(cls, cname, bases, \
cdict)
class Example(object):
__metaclass__ = MiniMeta
(In real code you should use super() here.)
cname is the name of the class as in 'class Foo', bases is a tuple
of the class's base classes, and cdict is what will be the class
dictionary (or in the case of __init__, what has already been turned
into the class dictionary). In __new__, meta is your metaclass
itself; in __init__, cls is the class object for the new class.
__new__ should return a newly created class object. Normally your
__new__ function will manipulate cdict and then use super()
to continue creating the class, returning the result; if you're going
to create the class before manipulating it, you might as well use
__init__. The only thing __init__ can usefully manipulate is
cls, since the other arguments have already been used to construct it.
(Technically __new__ can return anything it wants to, including an
existing class or even a non-class object, but doing so is a great way
to confuse everyone who ever reads your code.)
For reasons beyond the scope of this margin, your metaclass really must
descend from type(). Subclassing object() instead by accident will
cause all sorts of interesting failures with obscure error messages,
like TypeError: 'MiniMeta' object is not callable.