Python modules should make visible their (fundamental) types

October 28, 2009

These days, many Python modules make a habit of being selective about what names they allow to be visible to the outside world; you only expose the official, externally visible interfaces and so on.

This leads me to a thesis: all modules should make visible their fundamental types, in the same way that the types module makes visible names for built-in types. Note that this doesn't reveal any previously inaccessible things; outside people could already just use your regular interfaces to instantiate an instance of the type they're interested in, and then use type() on it. You're just giving them a name for it, one that they can conveniently use in places like isinstance() checks.

This all sounds abstract, so let me give a concrete example. Today, I wanted to create an interface that accepted a list of regular expressions in either uncompiled string form (for caller convenience in most cases) or as compiled regexps (for theoretical efficiency gains in some obscure situations). The best way to do this is to test whether list elements were already compiled regexps and if not, compile them. But you cannot do this easily, because the re module does not export the type of compiled regular expressions as a standalone name.

(Having two different interfaces doesn't make me happy in this situation for reasons that don't fit into this entry.)

In theory the workaround is as simple as type(re.compile("a")), but there is a hidden theoretical bear trap that might break this. Because it does not export the type, the re module effectively makes no guarantee that re.compile() will always return objects of the same type. There are plausible implementations that could give you different types for different regexp patterns; for example, you might get a faster Boyer-Moore matcher for constant strings, one regexp matcher for simple patterns, and another, slower one for the full complexity, and these objects could use quite different internal data structures.

(Exposing the nominal type of compiled regexps wouldn't prevent the re module from doing such tricks. It would just have an abstract base class (possibly without much behavior) that all of the various different implementations descended from, and this would keep isinstance() working and so on. Or you could take the StringType approach and expose a tuple of types.)


Comments on this page:

From 84.97.237.112 at 2009-10-28 16:39:35:

Weighing in! That's something I had to do with my own uncertainties.py module.

Written on 28 October 2009.
« A personal experience of web browsers making bad text editors
Why per-process (or per-user) memory resource limits are hard »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Oct 28 01:30:29 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.