Wandering Thoughts archives

2011-09-23

Python's attribute lookup order

One of the things that surprised me recently is that the Python language reference doesn't seem to document the specifics of how Python searches for the attribute when you try to access obj.attr. So here is a not completely formal description of the process for Python 2.7 or so for new-style classes.

(Such a description really belongs in the language reference because it's an important part of Python's semantics. It's possible that you can reassemble a full description of the lookup semantics from the bits and pieces scattered throughout the language reference.)

First, any time Python looks for an attribute on an object it doesn't just search the object's (nominal) attribute dictionary; it also searches back through the object's base classes (in method resolution order), ultimately terminating at object. Ordinary instances of classes don't have bases (and can't be given them), but classes do.

Let's say we're looking for attr on the object obj, and typ is the object's type object (ie type(obj)). Then the lookup order is more or less:

  1. if typ.attr exists and has both a __get__ and either a __set__ or a __delete__ attribute, it's a data descriptor; we call it and return the result.

    (Data descriptors thus deliberately preempt attributes on the object itself.)

  2. search for obj.attr. On classes (but not on non-class instances) this may be a descriptor with a __get__ instead of a plain attribute; if it is, it's called.

    (Classes evaluating their own descriptors is necessary in order to make various descriptor and property things work when you directly access the class. Consider cls.classmethod(), for example.)

  3. search for typ.attr. This may be a plain attribute or a non-data descriptor with a __get__; if it's a descriptor, it's called.

(As you might guess, the code does not search for typ.attr twice; it saves the result of the step one search for use in step three.)

Note that Python never looks at the type of typ; one level into the type hierarchy is as far as it goes. By contrast it will go as far along the base class inheritance hierarchy as it needs to, all the way to object if necessary.

(This explains how metaclasses can be used to add class-only attributes on classes. instance.attr will search only the instance and cls, its class, but not go the next step to type(cls), the metaclass. cls.attr will search cls and its base classes (in step 2) and then the metaclass (in step 3).)

It's important here to not confuse an object's base classes with its type or its type's base classes. I tend to think of base classes and types as creating a two dimensional structure, where inheritance goes sideways through a class's base classes to end up at object while its type goes up to end up at type. Although we often informally talk about instances of an ordinary class inheriting from things and attributes being looked up through them, this is not what is actually happening. A plain instance has no base classes; it is its type (the class it is an instance of) that has the base classes. Normally making this distinction is unimportant, but here it's vital.

(The confusing case is a metaclass, which has type as both its base class (since metaclasses subclass type) and its type (since all classes, metaclasses included, are normally instances of type).)

Sidebar: where this is done in CPython's code

In CPython, this is done in two separate pieces of code. Lookups for classes are handled by type's type_getattro(), in Objects/typeobject.c. Lookups for non-class objects are handled by object's PyObject_GenericGetAttrWithDict(), in Objects/object.c (which is called with no dictionary argument in this case).

AttributeLookupOrder written at 00:30:31; Add Comment

2011-09-22

An operational explanation of Python types

In the world of new style classes, everything in Python has a type (visible with 'type(thing)'); class instances have a type, classes have a type, even type() has itself as a type. Sure, fine, but what does it mean in Python to have a type? What do types actually do?

Given part 4 of what you can do with metaclasses, we can now see the answer: an object's type is where Python finds special methods for it.

Python does not look for special methods through the normal attribute lookup process (where it would effectively look up obj.__str__ if you did str(obj)); instead it goes straight to the type (doing the equivalent of type(obj).__str__). Here, let's have an example:

class Example(object):
  def __str__(self):
    return "class"

ei = Example()
ei.__str__ = lambda x: "lambda"
print ei.__str__(ei), str(ei)

This will print 'lambda class', showing that str() is not using the version of __str__ on the object itself.

Now we can see why everything in Python has the type that it does. Instances of a class have a type() of their class so that special methods are looked up on their class, which lets classes actually implement all of those special methods. Classes have a type() of their metaclass (or type if they have no metaclass) so that things like __call__ get looked up on their metaclass (or on type itself, which has the standard implementation of various things). Both type and object have the type of type for the same reason (well, in theory).

(My example above was contrived because people generally don't try to put special methods on instances. But it's easy to have a clash between a class special method and a metaclass special method, and then this does matter.)

One of the nice aspects of this is that it unifies how Python does attribute lookup for instances of classes with how it does it with classes. There is no special magic in the interpreter to treat them differently; how they behave is just determined by how their type acts. Much of what looks like fundamental behavior (such as using 'Example()' to create an instance of the class) is in fact simply due to how object and type act.

Types also get involved in explicit attribute lookups, but that's a much more complicated topic and is not as core to what a type is in Python. (Well, in my opinion.)

UnderstandingTypes written at 00:51:04; Add Comment

2011-09-19

An operational explanation of Python metaclasses (part 4)

The metaclass use of __call__ that I covered in part 2 and the use of __getattribute__ et al that I mentioned in part 3 are both specific instances of a general metaclass power: a metaclass supplies the special methods for its classes. However and as before, these special methods apply only when you are looking at or manipulating the class itself, not when you are dealing with instances of the class.

(Special methods for instances of the class have to be supplied by the class or an ancestor it inherits from, as usual.)

Most special methods aren't particularly useful for classes, since you rarely want to do things like treat a class object as a sequence; usually classes just sort of sit there being instantiated and subclassed. The useful special methods are the handful of methods that affect things that you use classes for, which means that part 2 and part 3 have already covered most of them.

(While you can define custom __str__ and __repr__ methods in a metaclass, note that these methods are not used when printing the class name portion of an instance of a class; you still get the familiar '<file.whatever object at 0x...>' result unless you have appropriate custom methods on the class itself.)

There are two additional sets of special methods that are worth mention. First, you can control how subclassing and instance checks work; this is covered in the Python documentation on Customizing instance and subclass checks, which explicitly mentions that these methods have to be defined in a metaclass. Second, you can use bare classes in with statements (instead of class instances) by implementing the context manager special methods on your metaclass. It's possible that there's some use for this.

Sidebar: why your metaclass must descend from type

Armed with this understanding about special methods I can now explain the reason why metaclasses have to subclass type instead of object and why you get strange error messages if you don't, as I mentioned in passing in part 1. We can see the answer by asking 'what is the metaclass of a class without a metaclass?'

The answer is 'type'. Given that your metaclass supplies implementations of special methods, this means that type supplies the default versions of things like __call__. If you subclass type in your metaclass, you inherit all of the necessary default versions of various special methods. If you don't, well, they aren't going to come from object; object doesn't supply them. This is also why you need to call up to type (either via super() or directly) in order to get various things done in a metaclass; type does all of the magic necessary to actually create new classes, call them to create new instances of them, and so on.

(Yes, technically a metaclass can be any callable not just a class. I'm looking only at the 'metaclass as a class' case right now.)

UsingMetaclass04 written at 00:51:48; Add Comment

2011-09-18

An operational explanation of Python metaclasses (part 3)

Following on from part 1 and part 2, the third thing that we can do with a metaclass is to create 'class-only attributes', attributes that are only visible on the class and not on instances of the class. In fact we can go further than just adding attributes; we can control what attributes are directly visible on the class without affecting what attributes are visible on instances of the class.

Simply adding attributes to the class (and only to the class) is done by putting attributes on the metaclass; in fact all attributes on the metaclass are visible on the class but not on instances of the class. Since this has some subtle bits, here is an example to illustrate:

class MiniMeta(type):
  one = "meta"
  two = "meta"

class Alpha(object):
  two = "alpha"

class Beta(Alpha):
  __metaclass__ = MiniMeta

bi = Beta()

Beta.one is "meta", but bi.one is an AttributeError; the attribute is visible only on the class and is not visible in instances. Beta.two and bi.two are both "alpha"; the attribute on the parent class overrides the attribute on the metaclass for both the class and an instance of the class (the same thing happens if we define two on Beta itself). Well, mostly.

The exception to parent classes taking priority is properties. A property set on the metaclass overrides anything else when you access it on the class, but is invisible to instances of the class. If you try hard this can be used to create attributes that have one value on the class and another value on instances of the class, which is sure to confuse everyone who reads your code unless you comment it heavily (and maybe even then).

The advanced version of this is that you can get partial or (almost) full control of attribute access to the class itself by setting up __getattr__, __getattribute__, __setattr__, and/or __delattr__ special methods on the metaclass. When people access attributes on the class itself (eg, as Beta.attr), these work just as if Beta was an ordinary instance of MiniMeta (because that's actually exactly what it is). However they are ignored when accessing attributes of Beta through instances of it, eg as bi.attr; the metaclass __getattribute__ and so on are not even called.

(As with properties, this can be abused to create attributes which have a different value on the class than on instances of the class.)

Note that this not-looking happens even when the lookup on the class is implicit, such as when you do len(bi) and Python looks to see if there's a Beta.__len__ method. In fact special method lookups don't go through any __getattribute__ or __getattr__ at all (this is covered in the official documentation).

(As with part 2, this is a specific example of a general metaclass power.)

Sidebar: method functions on the metaclass versus @classmethod

Both of these create 'class methods', method functions that take the class as the 'self' argument instead of an instance of the class. However they are not quite the same thing; in particular, @classmethod methods are visible (and work) from instances of the class while metaclass methods are not.

My personal opinion is that most of the time you want @classmethod because far more people are going to understand what you're doing. You want this even if there's no common ancestor class you can put the methods on and you have to invent some artificial mixin class to hold them.

UsingMetaclass03 written at 00:56:45; Add Comment

2011-09-17

An operational explanation of Python metaclasses (part 2)

After modifying a class as it's being created (covered in part 1), the next thing you can do with a metaclass is get a chance to do things when instances of the class are created. You do this by defining __call__ on your metaclass:

class MiniMeta(type):
  def __call__(cls, *args, **kwargs):
    return super(MiniMeta, cls).\
           __call__(*args, **kwargs)

class Example(object):
  __metaclass__ = MiniMeta

There are a number of things that you can do with this. One of them (which I first saw in another metaclass tutorial) is handling deferred initialization and setup of something related to the class. Rather than doing this in your metaclass __new__ or __init__, you defer it until the first time an instance of the class is created; this saves you effort in situations where many classes are defined but only a few classes will ever be used to create instances.

Using a metaclass __call__ to customize instance creation runs into the obvious question of why you don't just do the same work in the class's own __init__ method (or in extreme cases, its __new__ method). Probably the right answer is if you have a chunk of common behavior across a bunch of classes and the classes can't easily be put into an inheritance relationship where this functionality can be pulled into a common ancestor or mixin class.

(This use of metaclasses is considered sufficiently interested to get mentioned in passing in the official documentation for __metaclass__.)

PS: __call__ is actually a specific example of a general metaclass power. I'll get to the general power later because it requires more explanation.

Sidebar: a technicality

Strictly speaking using __call__ does not intercept all instance creation, since a sufficiently creative person can still obtain new instances of Example by calling object.__new__() directly. If you're doing something where you have to worry about this, Python is probably the wrong language to write your code in.

UsingMetaclass02 written at 00:49:55; Add Comment

2011-09-12

An operational explanation of Python metaclasses (part 1)

All of the explanations of metaclasses that I've read have started out by talking about the whole background and theory of operation of metaclasses. This approach doesn't work for me; by the time they get out of the background, I'm either asleep or my eyes have glazed over. So I'm going to tackle metaclasses from the other end, covering what you can do with them.

Part of the reason that metaclasses are complicated and confusing is that they can be used to do a number of mostly unrelated things. So to start out, let's talk about the classical and most common use of metaclasses: modifying a class as it's being created. This is more or less how things like Django's form and model definitions work, and it's what I did in my metaclass for namespaces.

(This is sort of like the kind of things that you can do with Lisp macros, although nowhere near as advanced.)

There are two spots where a metaclass can meddle in the creation of a class. A metaclass's __new__ is called before the class type object exists, is expected to return the newly created class object, and normally works by manipulating the 'class dictionary' of the class to be. A metaclass's __init__ is called after the class exists but before it has been completely finalized, and pretty much can only work by manipulating the new class object.

(This just like __new__ versus __init__ on conventional classes (cf), except that the 'object' you are dealing with is a class definition and the arguments to both functions come in a very specific form.)

Most metaclasses use __new__ instead of __init__. In general, most sophisticated changes are easier to do in __new__ because you don't have to worry about normal class magic getting in the way (for example, a function getting automatically converted to an unbound method when you try to retrieve it to modify it). In addition, because some things about a class are frozen at the moment that its class object is created, changing them can only be done in __new__; the obvious example is creating, modifying, or removing __slots__. You can add things to the class in __init__, and it may be clearer to do so there because you can simply set attributes directly.

(Properties do not have to be created in __new__ as far as I can see.)

Also, __new__ is free to return an existing class object. In theory you could use this to implement 'singleton classes'; in practice, I can't think of much use of this outside of something like Django, where the 'classes' are actually a little domain specific language to define things and where you might want two definitions of the same thing to result in the same actual class object (especially if you track state through the class object in the background).

The mechanics

__new__ and __init__ are called slightly differently; the signatures are:

class MiniMeta(type):
  def __new__(meta, cname, bases, cdict):
    return type.__new__(meta, cname, bases, \
                        cdict)

  def __init__(cls, cname, bases, cdict):
    return type.__init__(cls, cname, bases, \
                         cdict)

class Example(object):
  __metaclass__ = MiniMeta

(In real code you should use super() here.)

cname is the name of the class as in 'class Foo', bases is a tuple of the class's base classes, and cdict is what will be the class dictionary (or in the case of __init__, what has already been turned into the class dictionary). In __new__, meta is your metaclass itself; in __init__, cls is the class object for the new class.

__new__ should return a newly created class object. Normally your __new__ function will manipulate cdict and then use super() to continue creating the class, returning the result; if you're going to create the class before manipulating it, you might as well use __init__. The only thing __init__ can usefully manipulate is cls, since the other arguments have already been used to construct it.

(Technically __new__ can return anything it wants to, including an existing class or even a non-class object, but doing so is a great way to confuse everyone who ever reads your code.)

For reasons beyond the scope of this margin, your metaclass really must descend from type(). Subclassing object() instead by accident will cause all sorts of interesting failures with obscure error messages, like TypeError: 'MiniMeta' object is not callable.

UsingMetaclass01 written at 01:46:43; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.