Wandering Thoughts archives

2009-11-17

Finally understanding the appeal of 'Interfaces'

I spent a long time not really getting the need for coded, explicit implementations of 'Interfaces', by which I mean things like zope.interface. It didn't help that I generally encountered them as part of very large, complex systems like Zope and Twisted, and they tended to come with a lot of extra magic features, which made the whole idea seem like the sort of thing you only needed if you had to deal with such a beast.

Then, recently, the penny dropped and I finally saw the light. Shorn of complexity and extra features, what Interface implementations give you is an explicit and easily used way to assert and ask 'is-a' questions. Need to find out if this object is a compiled regular expression? Just ask if it supports the ICRegexp interface. What to be accepted as a compiled regular expression? Assert that you support ICRegexp.

(Assuming the best and forging ahead is still the most Pythonic approach, but per my original problem you sometimes do need to know this sort of thing. And per yesterday's entry, requiring inheritance is not the answer, especially if you want to build decoupled systems.)

When I put it this way, it's easy to see why you'd like a basic interface implementation. If you have to test at all, simple 'is-a' tests beat both 'is-a-descendant-of' restrictions and probing for duck typing with its annoyances and ambiguities (cf an earlier entry).

In this view, the important thing is really to have a unique name (really an object) for each interface, so that you avoid the duck typing ambiguity. A basic implementation is almost trivial; treat interfaces as opaque objects, and just register classes as supporting interfaces and then have an 'isinterface()' function that works by analogy to isinstance().

(This demonstrates the old computer science aphorism that there's no problem that can't be solved by an extra level of indirection, since that is basically what this does: it adds a level of indirection to isinstance(), so that instead of asking 'is this object an instance of one of these classes', you ask 'is this object an instance of a class that supports this interface'.)

More complex implementations are of course possible; you could give the interface objects actual behavior and information, add checks for basic duck typing compatibility with the interface, make it so that isinterface() can optionally check to see if the object seems to implement the interface without having declared it, and so on.

(Sooner or later you end up back at zope.interface.)

GettingInterfaces written at 00:19:21; Add Comment

2009-11-16

'Is-a' versus 'is-a-descendant-of'

One of the things that my issue with the Python re module not exposing its types has firmly mashed my nose into is the difference between 'is-a-descendant-of' and 'is-a' in object-oriented languages. It's conventional to think of them as more or less the same thing, even in a loose duck typed language like Python; it just seems to make sense for all compiled regular expressions to descend from a single base class, just as it theoretically makes sense for both plain bytestrings and Unicode strings to descend from an abstract generic string class.

(Technically, some of the things that I am calling classes here are actually types. In Python this is a distinction that can usually be ignored.)

Of course, when I write it out like this it's evident that it doesn't necessarily make sense. For example, the actual implementation of Python's base string class has no code and no behavior; it exists only for the convenience of programmers who want to isinstance() a single class. Similarly, a hypothetical version of the the C based regular expression module that used different classes for different sorts of regular expressions (in order to have different matching engines) could perfectly well have no common abstract class (especially since the re module does not expose such a class today).

On the flipside, it would be nice to be able to write alternate regular expression engines and have their objects accepted as 'compiled regular expressions'. Right now, anything that does duck typing will accept them, but things that look at types won't, purely because they don't descend from the current implementation of the regexp class (and you can't fix that, partly for reasons that I covered yesterday).

What this gets down to is that 'is-a' is effectively a question of interface, not of inheritance. In fact, duck typing in a nutshell is that your object 'is-a' compiled regular expression if it satisfies the expected interface behavior for such objects. Even in Python, we almost always use 'is-a-descendant-of' tests only as a convenient proxy for answering this 'is-a' question, but they are not quite the same thing and the difference can trip you (or other people) up.

(I'm sure I've read about this before, but there is a certain vividness to things this time around because I've had my nose rubbed in this.)

InheritanceVsInterface written at 00:50:50; Add Comment

2009-11-15

A limitation of Python types from C extension modules

It's recently struck me that there is an important difference between types (and classes) created in a Python module and types/classes that come from a C-level extension module.

Suppose that duck typing is not enough and so you really want to make a class that inherits from an outside class (one in another module), yet overrides all of its behavior. This lets you create objects that work the way you need them to but will pass isinstance() checks that are insisting on instances of the original class. Specifically, you want to be able to create instances of your new class without going through the normal object initialization process of your parent class.

(Yes, you'll need to do your own initialization instead to make your version of the behavior all work out, since once you're not using the parent type's initialization you can't assume that any of the parent's other methods keep working.)

If the outside module is a Python module, you can always (or perhaps almost always) do this. If the outside module is a C extension module, there is no guarantee that you will be able to do this (and sometimes you may not even be able to create your descendant class, much less initialize new instances of it). Fundamentally, the reason for this is the same reason as the reason you can't use object.__new__ on everything; the C module is the only thing that knows how to set up the C-level structures for its own objects, so it has to be involved in creating new instances.

This means that types created in C modules can be effectively sealed against descent and impersonation; they simply can't be substituted for in a way that will fool isinstance(). The corollary is that using isinstance() can in some situations be a much stronger guard than you might be expecting.

(It's possible to make a C-level type inheritable; all of the core Python types are C-level types, after all, and you can do things like inherit from list and str and so on.)

CModuleTypeLimitation written at 01:27:50; Add Comment

2009-11-08

My problem with the obvious solution to my unexposed types problem

In my last entry about solving my unexposed types problem, I sort of cheated; I left out one obvious solution. My problem was that I wanted to have a function with 'polymorphic arguments', one that could take either strings or compiled regular expressions and then tell them apart.

Well, you know, one obvious solution is to not have such a crazy interface in the first place. A clearer, simpler approach would be to have two separate functions, one that takes compiled regular expressions and a second one that takes strings (and then compiles them and probably calls the first function to do the actual work). Instead of trying to wedge two interfaces into one function, we just have one function per interface.

In my view, the problem is that this doesn't scale; multiplying functions this way in order to multiply your interfaces breeds, sometimes explosively, because you also need to multiply any higher-level interfaces to these functions. Let me illustrate what I mean by way of my specific case.

My case looks something like this:

def parse_stream(fp, matchlist):
  ...

def parse_cmd(cmd, matchlist):
  fp = os.popen(cmd, "r")
  return parse_stream(fp, matchlist)

(Both parse_stream and parse_cmd are official interfaces, but I expect to use parse_cmd more often.)

If I split parse_stream into two functions, I must also create two versions of parse_cmd, and if I had even higher level interfaces I'd have to split those too, and so on.

(The other way that this approach can compound is if you have functions that are polymorphic on more than one argument. But I'm not sure that that's at all a sensible interface to start with; at some point it has to get too confusing.)

CompoundingInterfaces written at 23:26:12; Add Comment

2009-11-07

Solving unexposed types and the limits of duck typing

When I ran into the issue of the re module not exposing its types, I considered several solutions to my underlying problem of distinguishing strings from compiled regular expressions. For various reasons, I wound up picking the solution that was the least annoying to code; I decided whether something was compiled regular expression by checking to see if it had a .match attribute.

This is the traditional Python approach to the problem; don't check types as such, just check to see if the object has the behavior that you're looking for. However, there's a problem with this, which I can summarize by noting that .match() is a plausible method name for a method on a string-like object, too.

Checking duck typing by checking for attribute names only works when you can be reasonably confidant that the attributes you're looking for are relatively unique. Unfortunately, nicely generic method names are likely to be popular, because they are simple and so broadly applicable, which means that you risk inadvertent collisions.

(A casual scan of my workstation's Python packages turns up several packages with classes with 'match()' methods. While I doubt that any of them are string-like, it does show that the method name is reasonably popular.)

You can improve the accuracy of these checks by testing for more than one attribute, but this rapidly gets both annoying and verbose.

(I'm sure that I'm not the first person to notice this potential drawback.)

Sidebar: the solutions that I can think of

Here's all of the other solutions that I can think of offhand:

  • extract the type from the re module by hand:
    CReType = type(re.compile("."))

  • invert the check by testing to see if the argument is a string, using isinstance() and types.StringTypes, and assume that it is a compiled regexp if it isn't.

  • just call re.compile() on everything, because it turns out it's smart enough to notice if you give it a compiled regular expression instead of a string.

I didn't discover the last solution until I wrote this entry. It's now tempting to revise my code to use it instead of the attribute test, especially since it would make the code shorter.

(This behavior is not officially documented, which is a reason to avoid it.)

DuckTypingLimits written at 23:40:56; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.