2009-11-17
Finally understanding the appeal of 'Interfaces'
I spent a long time not really getting the need for coded, explicit implementations of 'Interfaces', by which I mean things like zope.interface. It didn't help that I generally encountered them as part of very large, complex systems like Zope and Twisted, and they tended to come with a lot of extra magic features, which made the whole idea seem like the sort of thing you only needed if you had to deal with such a beast.
Then, recently, the penny dropped and I finally saw the light. Shorn of complexity and extra features, what Interface implementations give you is an explicit and easily used way to assert and ask 'is-a' questions. Need to find out if this object is a compiled regular expression? Just ask if it supports the ICRegexp interface. What to be accepted as a compiled regular expression? Assert that you support ICRegexp.
(Assuming the best and forging ahead is still the most Pythonic approach, but per my original problem you sometimes do need to know this sort of thing. And per yesterday's entry, requiring inheritance is not the answer, especially if you want to build decoupled systems.)
When I put it this way, it's easy to see why you'd like a basic interface implementation. If you have to test at all, simple 'is-a' tests beat both 'is-a-descendant-of' restrictions and probing for duck typing with its annoyances and ambiguities (cf an earlier entry).
In this view, the important thing is really to have a unique name
(really an object) for each interface, so that you avoid the duck
typing ambiguity. A basic implementation is almost
trivial; treat interfaces as opaque objects, and just register classes
as supporting interfaces and then have an 'isinterface()' function
that works by analogy to isinstance().
(This demonstrates the old computer science aphorism that there's no
problem that can't be solved by an extra level of indirection, since
that is basically what this does: it adds a level of indirection to
isinstance(), so that instead of asking 'is this object an instance of
one of these classes', you ask 'is this object an instance of a class
that supports this interface'.)
More complex implementations are of course possible; you could give
the interface objects actual behavior and information, add checks for
basic duck typing compatibility with the interface, make it so that
isinterface() can optionally check to see if the object seems to
implement the interface without having declared it, and so on.
(Sooner or later you end up back at zope.interface.)
2009-11-16
'Is-a' versus 'is-a-descendant-of'
One of the things that my issue with the Python re module not exposing its types has firmly mashed my nose into is the difference between 'is-a-descendant-of' and 'is-a' in object-oriented languages. It's conventional to think of them as more or less the same thing, even in a loose duck typed language like Python; it just seems to make sense for all compiled regular expressions to descend from a single base class, just as it theoretically makes sense for both plain bytestrings and Unicode strings to descend from an abstract generic string class.
(Technically, some of the things that I am calling classes here are actually types. In Python this is a distinction that can usually be ignored.)
Of course, when I write it out like this it's evident that it doesn't
necessarily make sense. For example, the actual implementation of
Python's base string class has no code and no behavior; it exists only
for the convenience of programmers who want to isinstance() a single
class. Similarly, a hypothetical version of the the C based regular
expression module that used different classes for different sorts of
regular expressions (in order to have different matching engines) could
perfectly well have no common abstract class (especially since the re
module does not expose such a class today).
On the flipside, it would be nice to be able to write alternate regular expression engines and have their objects accepted as 'compiled regular expressions'. Right now, anything that does duck typing will accept them, but things that look at types won't, purely because they don't descend from the current implementation of the regexp class (and you can't fix that, partly for reasons that I covered yesterday).
What this gets down to is that 'is-a' is effectively a question of interface, not of inheritance. In fact, duck typing in a nutshell is that your object 'is-a' compiled regular expression if it satisfies the expected interface behavior for such objects. Even in Python, we almost always use 'is-a-descendant-of' tests only as a convenient proxy for answering this 'is-a' question, but they are not quite the same thing and the difference can trip you (or other people) up.
(I'm sure I've read about this before, but there is a certain vividness to things this time around because I've had my nose rubbed in this.)
2009-11-15
A limitation of Python types from C extension modules
It's recently struck me that there is an important difference between types (and classes) created in a Python module and types/classes that come from a C-level extension module.
Suppose that duck typing is not enough and so
you really want to make a class that inherits from an outside class
(one in another module), yet overrides all of its behavior. This lets
you create objects that work the way you need them to but will pass
isinstance() checks that are insisting on instances of the original
class. Specifically, you want to be able to create instances of your
new class without going through the normal object initialization
process of your parent class.
(Yes, you'll need to do your own initialization instead to make your version of the behavior all work out, since once you're not using the parent type's initialization you can't assume that any of the parent's other methods keep working.)
If the outside module is a Python module, you can always (or perhaps
almost always) do this. If the outside module is a C extension module,
there is no guarantee that you will be able to do this (and sometimes
you may not even be able to create your descendant class, much less
initialize new instances of it). Fundamentally, the reason for this is
the same reason as the reason you can't use object.__new__ on
everything; the C module is the only thing that knows
how to set up the C-level structures for its own objects, so it has to
be involved in creating new instances.
This means that types created in C modules can be effectively sealed
against descent and impersonation; they simply can't be substituted for
in a way that will fool isinstance(). The corollary is that using
isinstance() can in some situations be a much stronger guard than you
might be expecting.
(It's possible to make a C-level type inheritable; all of the core
Python types are C-level types, after all, and you can do things
like inherit from list and str and so on.)
2009-11-08
My problem with the obvious solution to my unexposed types problem
In my last entry about solving my unexposed types problem, I sort of cheated; I left out one obvious solution. My problem was that I wanted to have a function with 'polymorphic arguments', one that could take either strings or compiled regular expressions and then tell them apart.
Well, you know, one obvious solution is to not have such a crazy interface in the first place. A clearer, simpler approach would be to have two separate functions, one that takes compiled regular expressions and a second one that takes strings (and then compiles them and probably calls the first function to do the actual work). Instead of trying to wedge two interfaces into one function, we just have one function per interface.
In my view, the problem is that this doesn't scale; multiplying functions this way in order to multiply your interfaces breeds, sometimes explosively, because you also need to multiply any higher-level interfaces to these functions. Let me illustrate what I mean by way of my specific case.
My case looks something like this:
def parse_stream(fp, matchlist): ... def parse_cmd(cmd, matchlist): fp = os.popen(cmd, "r") return parse_stream(fp, matchlist)
(Both parse_stream and parse_cmd are official interfaces, but I
expect to use parse_cmd more often.)
If I split parse_stream into two functions, I must also create two
versions of parse_cmd, and if I had even higher level interfaces I'd
have to split those too, and so on.
(The other way that this approach can compound is if you have functions that are polymorphic on more than one argument. But I'm not sure that that's at all a sensible interface to start with; at some point it has to get too confusing.)
2009-11-07
Solving unexposed types and the limits of duck typing
When I ran into the issue of the re module not exposing its types, I considered several solutions to my underlying
problem of distinguishing strings from compiled regular expressions.
For various reasons, I wound up picking the solution that was the least
annoying to code; I decided whether something was compiled regular
expression by checking to see if it had a .match attribute.
This is the traditional Python approach to the problem; don't check
types as such, just check to see if the object has the behavior that
you're looking for. However, there's a problem with this, which I can
summarize by noting that .match() is a plausible method name for a
method on a string-like object, too.
Checking duck typing by checking for attribute names only works when you can be reasonably confidant that the attributes you're looking for are relatively unique. Unfortunately, nicely generic method names are likely to be popular, because they are simple and so broadly applicable, which means that you risk inadvertent collisions.
(A casual scan of my workstation's Python packages turns up several
packages with classes with 'match()' methods. While I doubt that
any of them are string-like, it does show that the method name is
reasonably popular.)
You can improve the accuracy of these checks by testing for more than one attribute, but this rapidly gets both annoying and verbose.
(I'm sure that I'm not the first person to notice this potential drawback.)
Sidebar: the solutions that I can think of
Here's all of the other solutions that I can think of offhand:
- extract the type from the
remodule by hand:CReType = type(re.compile(".")) - invert the check by testing to see if the argument is a string, using
isinstance()andtypes.StringTypes, and assume that it is a compiled regexp if it isn't. - just call
re.compile()on everything, because it turns out it's smart enough to notice if you give it a compiled regular expression instead of a string.
I didn't discover the last solution until I wrote this entry. It's now tempting to revise my code to use it instead of the attribute test, especially since it would make the code shorter.
(This behavior is not officially documented, which is a reason to avoid it.)