Solving unexposed types and the limits of duck typing

November 7, 2009

When I ran into the issue of the re module not exposing its types, I considered several solutions to my underlying problem of distinguishing strings from compiled regular expressions. For various reasons, I wound up picking the solution that was the least annoying to code; I decided whether something was compiled regular expression by checking to see if it had a .match attribute.

This is the traditional Python approach to the problem; don't check types as such, just check to see if the object has the behavior that you're looking for. However, there's a problem with this, which I can summarize by noting that .match() is a plausible method name for a method on a string-like object, too.

Checking duck typing by checking for attribute names only works when you can be reasonably confidant that the attributes you're looking for are relatively unique. Unfortunately, nicely generic method names are likely to be popular, because they are simple and so broadly applicable, which means that you risk inadvertent collisions.

(A casual scan of my workstation's Python packages turns up several packages with classes with 'match()' methods. While I doubt that any of them are string-like, it does show that the method name is reasonably popular.)

You can improve the accuracy of these checks by testing for more than one attribute, but this rapidly gets both annoying and verbose.

(I'm sure that I'm not the first person to notice this potential drawback.)

Sidebar: the solutions that I can think of

Here's all of the other solutions that I can think of offhand:

  • extract the type from the re module by hand:
    CReType = type(re.compile("."))

  • invert the check by testing to see if the argument is a string, using isinstance() and types.StringTypes, and assume that it is a compiled regexp if it isn't.

  • just call re.compile() on everything, because it turns out it's smart enough to notice if you give it a compiled regular expression instead of a string.

I didn't discover the last solution until I wrote this entry. It's now tempting to revise my code to use it instead of the attribute test, especially since it would make the code shorter.

(This behavior is not officially documented, which is a reason to avoid it.)


Comments on this page:

From 77.22.207.176 at 2009-11-08 03:36:56:

Actually, your last solution seems to be just a spin of your first...

re.py uses pattern_type = type( sre_compile.compile("", 0) ) in an isinstance() check internally, so you more or less just shift the check by one level.

The advantage of course still is, you don't have to maintain it, if internal types change at any time.

By cks at 2009-11-08 12:39:54:

How the re module implements this feature in re.compile() is, well, an implementation detail; from an outside perspective, one just cares that it works (and that the re module, not you, is responsible for keeping it working if things change inside the re module, or at least it would be this way if it was actually documented).

I'm not surprised that it's implemented in the way it is, because that's about the only truly reliable way to do a check like this. (It's amusing that not even the re module has access to the type, but that's an artifact of how regular expressions are implemented under the hood.)

Written on 07 November 2009.
« A gotcha with Bash on Ubuntu 8.04
My problem with the obvious solution to my unexposed types problem »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Nov 7 23:40:56 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.