Wandering Thoughts archives

2014-01-31

Why I now believe that duck typed metaclasses are impossible in CPython

As I mentioned in my entry on fake versus real metaclasses, I've wound up a bit obsessed with the question of whether it's possible to create a fully functional metaclass that doesn't inherit from type. Call this a 'duck typed metaclass' or if you want to be cute, a 'duck typed type' (DTT). As a result of that earlier entry and some additional exploration I now believe that it's impossible.

Let's go back to MetaclassFakeVsReal for a moment and look at the fake metaclass M2:

class M2(object):
   def __new__(self, name, bases, dct):
      print "M2", name
      return type(name, bases, dct)

class C2(object):
   __metaclass__ = M2

class C4(C2):
   pass

As we discovered, the problem is that C2 is not an instance of M2 and so (among other things) its subclass C4 will not invoke M2 when it is being created. The real metaclass M1 avoided this problem by instead using type.__new()__ in its __new__ method. So why not work around the problem by making M2 do so too, like this:

class M2(object):
   def __new__(self, name, bases, dct):
      print "M2", name
      return type.__new__(self, name, bases, dct)

Here's why:

TypeError: Error when calling the metaclass bases
    type.__new__(M2): M2 is not a subtype of type

I believe that this is an old friend in a new guise. Instances of M2 would normally be based on the C-level structure for object (since it is a subclass of object), which is not compatible with the C-level type structure that instances of type and its subclasses need to use. So type says 'you cannot do this' and walks away.

Given that we need C2 to be an instance of M2 so that things work right for subclasses of C2 and we can't use type, we can try brute force and fakery:

class M2(object):
   def __new__(self, name, bases, dct):
      print "M2", name
      r = super(M2, self).__new__()
      r.__dict__.update(dct)
      r.__bases__ = bases
      return r

This looks like it works in that C4 will now get created by M2. However this is an illusion and I'll give you two examples of the ensuing problems, each equally fatal.

Our first problem is creating instances of C2, ie the actual objects that we will want to use in code. Instance creation is fundamentally done by calling C2(), which means that M2 needs a __call__ special method (so that C2, an instance of M2, becomes callable). We'll try a version that delegates all of the work to type:

  def __call__(self, *args, **kwargs):
     print "M2 call", self, args, kwargs
     return type.__call__(self, *args, **kwargs)

Unsurprisingly but unfortunately this doesn't work:

TypeError: descriptor '__call__' requires a 'type' object but received a 'M2'

Okay, fine, we'll try more or less the same trick as before (which is now very dodgy, but ignore that for now):

  def __call__(self, *args, **kwargs):
     print "M2 call", self, args, kwargs
     r = super(M2, self).__new__(self)
     r.__init__(*args, **kwargs)
     return r

You can probably guess what's coming:

TypeError: object.__new__(X): X is not a type object (M2)

We are now well and truly up the creek because classes are the only thing in CPython that can have instances. Classes are instances of type and as we've seen we can't create something that is both an instance of M2 (so that M2 is a real metaclass instead of a fake one) and an instance of type. Classes without instances are obviously not actually functional.

The other problem is that despite how it appears C4 is not actually a subclass of C2 because of course classes are the only thing in CPython that can have subclasses. In specific, attribute lookups on even C4 itself will not look at attributes on C2:

>>> C2.dog = 10
>>> C4.dog
AttributeError: 'M2' object has no attribute 'dog'

The __bases__ attribute that M2.__new__ glued on C4 (and C2) is purely decorative. Again, looking attributes up through the chain of bases (and the entire method resolution order) is something that happens through code that is specific to instances of type. I believe that much of it lives under the C-level function that is type.__getattribute__, but some of it may be even more magically intertwined into the guts of the CPython interpreter than that. And as we've seen, we can't call type.__getattribute__ ourselves unless we have something that is an instance of type.

Note that there is literally no attributes we can set on non-type instances that will change this. On actual instances of type, things like __bases__ and __mro__ are not actual attributes but are instead essentially descriptors that look up and manipulate fields in the C-level type struct. The actual code that does things like attribute lookups uses the C-level struct fields directly, which is one reason it requires genuine type instances; only genuine instances even have those struct fields at the right places in memory.

(Note that attribute inheritance in subclasses is far from the only attribute lookup problem we have. Consider accessing C2.afunction and what you'd get back.)

Either problem is fatal, never mind both of them at once (and note that our M2.__call__ is nowhere near a complete emulation of what type.__call__ actually does). Thus as far as I can tell there is absolutely no way to create a fully functional duck typed metaclass in CPython. To do one you'd need access to the methods and other machinery of type and type reserves that machinery for things that are instances of type (for good reason).

I don't think that there's anything in general Python semantics that require this, so another Python implementation might allow or support enough to enable duck typed metaclasses. What blocks us in CPython is how CPython implements type, object, and various core functionality such as creating instances and doing attribute lookups.

(I tried this with PyPy and it failed with a different set of errors depending on which bits of type I was trying to use. I don't have convenient access to any other Python implementations.)

python/MetaclassDuckTypingImpossible written at 23:15:29; Add Comment

Linux has at least two ways that disks can die

We lost a disk on one of our iSCSI backends last night. Normally when an iSCSI data disk dies on a backend, what happens at the observable system level is that the disk vanishes. If it used to be, say, sdk, then there is no sdk any more. I'm not quite sure what happens at the kernel level as far as our iSCSI target software goes, but the reference that the iSCSI target kernel module holds doesn't work any more. This is basically just the same as what happens when you physically pull a live disk and I assume that the same kernel and udev mechanisms are at work.

(When you swap out the dead disk and put a new one in, the new one shows up as a new disk under some name. Even if it winds up with the same sdX name it's sufficiently much a different device that our iSCSI target software still won't automatically talk to it; we have to carefully poke the software by hand.)

This is not what happened this time around. Instead the kernel seems to have basically thrown up its hands and declared the disk dead but not gone. The disk was still there in /dev et al and you could open the disk device, but any attempt to do IO to it produced IO errors. Physically removing the dead disk and inserting a new one did nothing to change this; there doesn't seem to have been any hotplug activity triggered or anything. All we got was a long run of errors like:

kernel: sd 4:0:0:0: [sdm] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdm, sector 504081380

(Kernel log messages suggest that possibly this happened because the kernel was unable to successfully reset the channel, but that's reading tea leaves very closely.)

I was going to speculate about this sort of split making sense, but I don't actually know what level of the kernel this DID_BAD_TARGET error comes from. So this could be a general kernel feature to declare disks as 'present but bad' or this could be a low level driver reporting a hardware status up the stack (or it could be something in between, where a low-level driver knows the disk is not there but this news got lost at a higher level).

Regardless of what and where this error means, we were still left with a situation where the kernel thought a disk was present when we had already physically removed it. In the end we managed to fix it by forcing a rescan of that eSATA channel with:

echo - - - >/sys/class/scsi_host/hostN/scan

That woke the kernel up to the disk being gone, at which point a newly inserted replacement disk was also recognized and we could go on as we usually do when replacing dead disks.

I'm going to have to remember these two different failure modes in the future. We clearly can't assume that all disk failures will be nice enough to cause the disk to disappear from the system, and thus we can't assume that all visible disks are actually working (and thus 'the system is showing N drives present as we expect' is not a full test).

(This particular backend has now been up for 632 days, and as a result of this glitch we are considering perhaps rebooting it. But reboots of production iSCSI backends are a big hassle, as you might imagine.)

linux/LinuxDifferentDiskDeaths written at 00:59:29; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.