Wandering Thoughts archives

2009-10-28

Python modules should make visible their (fundamental) types

These days, many Python modules make a habit of being selective about what names they allow to be visible to the outside world; you only expose the official, externally visible interfaces and so on.

This leads me to a thesis: all modules should make visible their fundamental types, in the same way that the types module makes visible names for built-in types. Note that this doesn't reveal any previously inaccessible things; outside people could already just use your regular interfaces to instantiate an instance of the type they're interested in, and then use type() on it. You're just giving them a name for it, one that they can conveniently use in places like isinstance() checks.

This all sounds abstract, so let me give a concrete example. Today, I wanted to create an interface that accepted a list of regular expressions in either uncompiled string form (for caller convenience in most cases) or as compiled regexps (for theoretical efficiency gains in some obscure situations). The best way to do this is to test whether list elements were already compiled regexps and if not, compile them. But you cannot do this easily, because the re module does not export the type of compiled regular expressions as a standalone name.

(Having two different interfaces doesn't make me happy in this situation for reasons that don't fit into this entry.)

In theory the workaround is as simple as type(re.compile("a")), but there is a hidden theoretical bear trap that might break this. Because it does not export the type, the re module effectively makes no guarantee that re.compile() will always return objects of the same type. There are plausible implementations that could give you different types for different regexp patterns; for example, you might get a faster Boyer-Moore matcher for constant strings, one regexp matcher for simple patterns, and another, slower one for the full complexity, and these objects could use quite different internal data structures.

(Exposing the nominal type of compiled regexps wouldn't prevent the re module from doing such tricks. It would just have an abstract base class (possibly without much behavior) that all of the various different implementations descended from, and this would keep isinstance() working and so on. Or you could take the StringType approach and expose a tuple of types.)

ExposeYourTypes written at 01:30:29; Add Comment

2009-10-22

How to waste lots of CPU time checking for module updates

I understand that Python does not have automatic reloading of changed modules and Python source files, and that some people would like to have this feature for their long-running programs and systems. However, there is a right way and a wrong way to do this, especially for things like web frameworks.

The right way is to check for updates (and reload any affected modules) immediately before you process the next request (specifically, when you wake up knowing that you have a request, you immediately check for pending reloads, carry them out, and then go on to process the new request). The wrong way is to check every so often, even if you haven't received any requests; the especially wrong way is to check frequently, say once a second. Things like stat() system calls are cheap, but they are not free; doing fifty to a hundred of them every second adds up over the lifetime of a long running, otherwise generally inactive process.

(This is especially true in our environment, because those stat() system calls periodically require the server in question to talk to the fileserver over NFS. They might be a lot cheaper on a local filesystem, but your code shouldn't be assuming a local filesystem or, for that matter, an otherwise idle system.)

The other reason that frequent wakeups to check for this sort of stuff should be avoided these days is virtualization. If you are running inside a virtualized machine, waking up frequently forces the entire virtual machine to wake up frequently, which is very definitely not free. People making heavy use of virtualization are much happier if virtual machines do not wake up when they are otherwise idle.

(In a desktop application, frequent wakeups may stop a laptop from going to low-power sleep modes and even force undesirable disk IO, but this is not usually an issue for a web application.)

(I can't conclusively identify what Python framework this was, so I'm not going to name any names. Besides, it might have been some configuration option or optional module that really did the damage.)

Sidebar: the case for doing preemptive, periodic reloads

There are two potential justifications for periodic reloads instead of 'just before processing a request' ones. First, you don't delay handling requests by checking for and processing reloads (hopefully they will have been done beforehand), and second, you find out if there is an error before you're trying to process a real request. I'm not convinced that these are strong reasons, especially the second one; I'm not sure that Python exposes very many useful controls for handling module reload failures, although I haven't looked into this very hard (yes, I know this is a dangerous thing to write in a blog entry).

(There are at least two options for how you want to behave on module reload failures; you might retain the last good version of the module, or you might stop further processing until you can successfully import the module. The extreme version of the latter is just to exit.)

WrongWayUpdateChecks written at 01:53:53; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.