2014-01-31
Why I now believe that duck typed metaclasses are impossible in CPython
As I mentioned in my entry on fake versus real metaclasses, I've wound up a bit obsessed with the question
of whether it's possible to create a fully functional metaclass
that doesn't inherit from type
. Call this a 'duck typed metaclass'
or if you want to be cute, a 'duck typed type' (DTT). As a result
of that earlier entry and some additional
exploration I now believe that it's impossible.
Let's go back to MetaclassFakeVsReal for a moment and look at the
fake metaclass M2
:
class M2(object): def __new__(self, name, bases, dct): print "M2", name return type(name, bases, dct) class C2(object): __metaclass__ = M2 class C4(C2): pass
As we discovered, the problem is that C2
is not an instance of M2
and so (among other things) its subclass C4
will not invoke M2
when
it is being created. The real metaclass M1
avoided this problem by
instead using type.__new()__
in its __new__
method. So why
not work around the problem by making M2
do so too, like this:
class M2(object): def __new__(self, name, bases, dct): print "M2", name return type.__new__(self, name, bases, dct)
Here's why:
TypeError: Error when calling the metaclass bases type.__new__(M2): M2 is not a subtype of type
I believe that this is an old friend in a new
guise. Instances of M2
would normally be based on the C-level
structure for object
(since it is a subclass of object
), which
is not compatible with the C-level type
structure that instances
of type
and its subclasses need to use. So type
says 'you cannot
do this' and walks away.
Given that we need C2
to be an instance of M2
so that things work
right for subclasses of C2
and we can't use type
, we can try brute
force and fakery:
class M2(object): def __new__(self, name, bases, dct): print "M2", name r = super(M2, self).__new__() r.__dict__.update(dct) r.__bases__ = bases return r
This looks like it works in that C4
will now get created by M2
.
However this is an illusion and I'll give you two examples of the
ensuing problems, each equally fatal.
Our first problem is creating instances of C2
, ie the actual
objects that we will want to use in code. Instance creation is
fundamentally done by calling C2()
, which means that M2
needs a
__call__
special method (so that C2
, an instance of M2
, becomes
callable). We'll try a version that delegates all of the work to type
:
def __call__(self, *args, **kwargs): print "M2 call", self, args, kwargs return type.__call__(self, *args, **kwargs)
Unsurprisingly but unfortunately this doesn't work:
TypeError: descriptor '__call__' requires a 'type' object but received a 'M2'
Okay, fine, we'll try more or less the same trick as before (which is now very dodgy, but ignore that for now):
def __call__(self, *args, **kwargs): print "M2 call", self, args, kwargs r = super(M2, self).__new__(self) r.__init__(*args, **kwargs) return r
You can probably guess what's coming:
TypeError: object.__new__(X): X is not a type object (M2)
We are now well and truly up the creek because classes are the only
thing in CPython that can have instances. Classes are instances of
type
and as we've seen we can't create something that is both an
instance of M2
(so that M2
is a real metaclass instead of a fake
one) and an instance of type
. Classes without instances are obviously
not actually functional.
The other problem is that despite how it appears C4
is not actually
a subclass of C2
because of course classes are the only thing
in CPython that can have subclasses. In specific, attribute lookups
on even C4
itself will not look at attributes on C2
:
>>> C2.dog = 10 >>> C4.dog AttributeError: 'M2' object has no attribute 'dog'
The __bases__
attribute that M2.__new__
glued on C4
(and C2
)
is purely decorative. Again, looking attributes up through the chain of
bases (and the entire method resolution order)
is something that happens through code that is specific to instances of
type
. I believe that much of it lives under the C-level function that
is type.__getattribute__
, but some of it may be even more magically
intertwined into the guts of the CPython interpreter than that. And as
we've seen, we can't call type.__getattribute__
ourselves unless we
have something that is an instance of type
.
Note that there is literally no attributes we can set on non-type
instances that will change this. On actual instances of type
, things
like __bases__
and __mro__
are not actual attributes but are
instead essentially descriptors that look up and manipulate fields
in the C-level type
struct. The actual code that does things like
attribute lookups uses the C-level struct fields directly, which is one
reason it requires genuine type
instances; only genuine instances even
have those struct fields at the right places in memory.
(Note that attribute inheritance in subclasses is far from the only
attribute lookup problem we have. Consider accessing C2.afunction
and what you'd get back.)
Either problem is fatal, never mind both of them at once (and note
that our M2.__call__
is nowhere near a complete emulation of
what type.__call__
actually does). Thus as far as I can tell
there is absolutely no way to create a fully functional duck typed
metaclass in CPython. To do one you'd need access to the methods
and other machinery of type
and type
reserves that machinery
for things that are instances of type
(for good reason).
I don't think that there's anything in general Python semantics that
require this, so another Python implementation might allow or support
enough to enable duck typed metaclasses. What blocks us in CPython is
how CPython implements type
, object
, and various core functionality
such as creating instances and doing attribute lookups.
(I tried this with PyPy and it failed with a different set of errors
depending on which bits of type
I was trying to use. I don't have
convenient access to any other Python implementations.)
Linux has at least two ways that disks can die
We lost a disk on one of our iSCSI backends last
night. Normally when an iSCSI data disk dies on a backend, what happens
at the observable system level is that the disk vanishes. If it used to
be, say, sdk
, then there is no sdk
any more. I'm not quite sure what
happens at the kernel level as far as our iSCSI target software goes,
but the reference that the iSCSI target kernel module holds doesn't
work any more. This is basically just the same as what happens when you
physically pull a live disk and I assume that the same kernel and udev
mechanisms are at work.
(When you swap out the dead disk and put a new one in, the new one shows up as a new disk under some name. Even if it winds up with the same sdX name it's sufficiently much a different device that our iSCSI target software still won't automatically talk to it; we have to carefully poke the software by hand.)
This is not what happened this time around. Instead the kernel seems
to have basically thrown up its hands and declared the disk dead but
not gone. The disk was still there in /dev
et al and you could
open the disk device, but any attempt to do IO to it produced IO
errors. Physically removing the dead disk and inserting a new one did
nothing to change this; there doesn't seem to have been any hotplug
activity triggered or anything. All we got was a long run of errors
like:
kernel: sd 4:0:0:0: [sdm] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdm, sector 504081380
(Kernel log messages suggest that possibly this happened because the kernel was unable to successfully reset the channel, but that's reading tea leaves very closely.)
I was going to speculate about this sort of split making sense, but I
don't actually know what level of the kernel this DID_BAD_TARGET
error comes from. So this could be a general kernel feature to declare
disks as 'present but bad' or this could be a low level driver reporting
a hardware status up the stack (or it could be something in between, where
a low-level driver knows the disk is not there but this news got lost at a
higher level).
Regardless of what and where this error means, we were still left with a situation where the kernel thought a disk was present when we had already physically removed it. In the end we managed to fix it by forcing a rescan of that eSATA channel with:
echo - - - >/sys/class/scsi_host/hostN/scan
That woke the kernel up to the disk being gone, at which point a newly inserted replacement disk was also recognized and we could go on as we usually do when replacing dead disks.
I'm going to have to remember these two different failure modes in the future. We clearly can't assume that all disk failures will be nice enough to cause the disk to disappear from the system, and thus we can't assume that all visible disks are actually working (and thus 'the system is showing N drives present as we expect' is not a full test).
(This particular backend has now been up for 632 days, and as a result of this glitch we are considering perhaps rebooting it. But reboots of production iSCSI backends are a big hassle, as you might imagine.)