2006-03-25
A little gotcha with os.path.join
For my sins, I am one of those people who doesn't always read Python's
fine documentation carefully enough. For example, I wrote a large chunk
of code using the helpful os.path.join() before I noticed a little
gotcha: it's very helpful. In particular:
Joins one or more path components intelligently. If any component is an absolute path, all previous components are thrown away, and joining continues.
(Right there in black and white in the fine documentation.)
This means that you cannot do:
fpath = os.path.join(root, untrusted)
At least, you can't do this safely and have it do what you probably think it does.
The other gotcha to remember for os.path.join is that no matter
how convenient it looks on a Unix machine, urls are not paths. If
you use os.path.join() on URLs and ever run on a Windows machine, the
results are not likely to be pleasant. You probably want the urlparse
module's
urljoin() function, but note that it behaves much like os.path.join
when handled absolute second parts, so:
>>> urljoin("http://host/a/b", "/d/e/f")
'http://host/d/e/f'
This is convenient if you are expecting it.
2006-03-23
We apologize for the disruption in syndication feeds
In the process of enabling caching in DWiki, I managed to illustrate the importance of thorough testing by failing to properly cache the real titles of entries and then not noticing, because that the Atom feeds stayed the same was the one thing I didn't test. (I carefully and somewhat obsessively tested that the rendering of all of the other pages did stay the same. I even have that automated.)
This caused the syndication feeds for WanderingThoughts to revert to plain filename based titles for eight hours or so, until I noticed (yet another example of ReadYourFeed; I did eventually, but not fast enough). So everyone just saw a gratuitous feed change or two, which I apologize for.
The effects on the LiveJournal feed were a little bit more dramatic. Despite the only change to the feed being the titles, LiveJournal decided to revive two entries from March 7th and 8th and arbitrarily give them today's date, effectively regurgitating stale content on the readers of the LiveJournal feed. This irritates me, but probably not as much as it irritates you.
(I call it arbitrarily because the entries have proper (old) timestamps in the Atom feed, and they're not at the top of the Atom feed either. LiveJournal's syndication stuff mystifies me on a regular basis. Hopefully it is not doing something as crazy as using the entry title as its internal identifier.)
The incident has also pointed out some problems in my caching implementation. In particular, I need to version the cached data so that a new DWiki version using a cache with old data silently ignores it instead of exploding spectacularly. (Alternately I need a more robust way of storing the data, because sooner or later I may forget to bump the data version number during an upgrade and then have the kabooms anyways.)
Caches are hard. Let's go shopping for faster CPUs.
2006-03-16
A tricky memoization decorator
This is the kind of hack where I present the Python code first and explain it later:
def memo1(func):
def _wrap(self):
r = func(self)
setattr(self, func.func_name, \
lambda : r)
return r
return _wrap
class Foo(object):
...
def invar(self):
return complex(self.field1) or \
complex2(self.field2)
invar = memo1(invar)
Here, invar is a method that computes some function of the
object, with a value that never changes over the life of a given
object. The computation is more expensive than we like, so we don't
want to call it more than once, but it may never be called so it's not
worth precomputing the result in __init__. Clearly, memoization to the rescue.
Ordinary memoization would store the result in a variable somewhere, and
check it, and so on. We don't bother; instead, memo1 stores the result
by rewriting the object's invar method to be a lambda expression that
just returns the value, which is about as efficient as you can get. (It
introspects the function it's wrapping to find out what name to rewrite,
which rules out certain method tricks.)
One subtlety here is that the lambda takes no arguments, not a
self argument that it ignores. A function that is added to an object
instead of a class is not subject to the normal process of method
lookup, so does not get called with any implicit self.
I would rather do the memoization in something like memo1 than in
invar itself, because doing it in a decorator keeps the invar
function itself clean and obvious. While the lambda rewrite version
does perform somewhat better than the traditional memoization scheme,
the bigger attraction of it to me is that it's just simpler; the
code doesn't have to play games to find a storage spot for the result.
(Tastes are quite likely to differ here.)
Unfortunately there is no way to pull this trick with properties; once a variable in a class is a property, you cannot turn it into an ordinary variable for a single object. In some situations doing this would be nice, because calling a lambda function is a bit over two times slower than accessing an object variable on my machine.
Sidebar: a more traditional memo1
Because I wrote it to do timing tests, here's a version of memo1
that doesn't use lambda rewriting:
def memo1(func):
s = []
def _wrap(self):
if s:
return s[0]
r = func(self)
s.append(r)
return r
return _wrap
There is probably a cleverer place to store the return value.
2006-03-09
Closures versus classes for capturing state
Possibly like a lot of other people, I have a 'thread pool' Python module for doing decoupled asynchronous queries of things. The basic interface looks like this:
wp = WorkPool(HOWMANY) for w in what: wp.enqueue(wrap_up(w)) wp.waitDone() res = wp.getanswers()
The important operation is .enqueue(<THING>), which queues up
<THING> to be called in a worker thread; its return value is saved, to
be disgorged later by .getanswers().
To keep things simple, <THING> is called with no arguments. But
usually what I really want to do is to call the same function with
a bunch of different arguments, for example calling is_in_sbl for a bunch of IP addresses. To make this
work, I need to capture the argument (or arguments) and create a simple
callable object that remembers them; in this example, that's done by
wrap_up.
There's two ways of wrapping up this sort of state in Python: classes
(okay, objects) and closures. In the class-based approach, wrap_up
is actually a class; its __init__ method saves w, and its
__call__ method uses the saved w to do the actual function call.
By contrast, in a closure-based approach wrap_up is a plain function
that creates and returns closures, like this:
def wrap_up(w): return lambda: real_func(w)
The class approach is the more obvious and natural one, because the state saving is explicit. But when I wrote two structurally identical programs, the first one using a class and the second one with closures, I found that I plain liked closures better.
For me, the drawback of the class based approach is that it's too
verbose. The same wrap_up would be:
class wrap_up:
def __init__(self, w):
self.w = w
def __call__(self):
return real_func(self.w)
(and it gets longer if you have more than one argument.)
I have a visceral reaction to verbosity; in cases like this it feels like make-work. The retort is that closures are far more magic than classes for most people and it's better to avoid magic, even if the result is more verbose.
For now, my view is that programming should be pleasant, not annoying, and that the verbosity of the class-based approach to state capture annoys me. So closures it is.
(Perhaps the real answer is that WorkPool should have an
.enqueue_call(func, *args, **kwargs) method, as well as the plain
.enqueue() one. This strays towards the Humane Interfaces
arguments, though, so I'm a bit nervous.)
2006-03-04
os.walk can be surprisingly slow
Yesterday's tracing made one particular bit of DWiki's page rendering pop out at me; almost 40% of the time for the page was going just to create the '(Previous | Next)' links at the bottom.
Because DWiki is entirely filesystem based, its 'metadata' is the state of the filesystem. Thus, creating the previous and next links requires walking WanderingThoughts' directory hierarchy to find out the order of the entries. Once I saw the times, I immediately started wondering if I was doing this traversal as fast as I could be.
The natural way to do directory traversal is with the os module os.walk generator. DWiki's
traversal needed to generate both filenames and their modification
times, so the core of the code was roughly:
for rt, ds, fs in os.walk(dpath):
for f in fs:
fn = os.path.join(rt, f)
yield os.getmtime(fn), fn
I benchmarked this against three alternate versions: a recursive list
version, an iterative list version, and an iterative generator version.
All of them are a bit less than twice as fast as the os.walk version,
with the generator version slightly faster than the others. (To my
surprise.)
(Generator-ness or lack thereof isn't important for DWiki, and some other requirements make the iterative version simpler. In my test program the generator versions are converted to lists as part of the timing.)
Further experimentation showed that the os.getmtime call accounted for
about 30% of the time of the os.walk version. Apparently in Python,
stat() isn't as speedy as I might have hoped. Thus the major speedup
in my alternate versions is that they can stat() each entry only once;
the os.walk version has to stat each twice, once in os.walk to see if
it's a directory and once in my code to get the modification time.
Doing os.walk without returning the modification time still wasn't
as fast as my alternate versions, although it was much closer. I suspect
that the remaining slowdown is because os.walk is a recursive
generator, which means it has to trickle results up through several
levels, and I add another level of yield on top.
Then I spent the rest of today's hacking time coding a neat scheme for pre-parsing DWiki's templates to speed them up, only to find out that it was slower than the current brute force template processing. That's sometimes how it goes with performance tuning; your best improvements are from ten minute hacks, and your hour long nice work gets discarded in the end.
2006-03-03
Visualizing dynamic program flow
Someone I know has been talking to me recently about bravely becoming the first outside user of DWiki. As part of this, I realized he might like to have practical documentation about how DWiki templates get used in practice for things like producing this blog. Which made me realize that I'd forgotten how the template flow goes.
I could have re-read the templates and traced the execution path, but then I had a bright idea: why not make the program do this for me? So I instrument template processing to dump all the templates used, properly nested to show the expansion structure.
Once I had this visualization, more questions started popping out: what work was being done by each template? How long was it taking? Augmenting the tracing to add them wasn't too hard, and somewhat to my surprised it started bringing out useful information; the performance information has been particularly useful. It's evidently one thing to profile an application, but another thing to see where in the flow of execution the time is going (instead of in the code).
Visualizing program flow this way is sort of a variant of debugging
via print; instead of trying
to figure out how the program got somewhere, you're interested in seeing
where it went in general.
To illustrate what I'm talking about, here's an example. This is the flow of templates for the web page for a single blog entry (this one in specific):
- dwiki.tmpl:
- html/dtd
- struct/head.tmpl: $var $var
- html/css: $var
- syndication/discovery.tmpl: atom::autodisc
- struct/header.tmpl:
- struct/breadcrumbs: breadcrumbs
- struct/loginstate.tmpl: $var
- dwiki/overrides.tmpl:
- Overrides/blog/all.tmpl:
- dwiki/standard.tmpl:
- views/normal-file.tmpl:
- wikiauth.tmpl: wikitext[4.1ms]
- comment/tools.tmpl: comment::countlink Empty
- wikiauth.tmpl: wikitext[4.1ms]
- views/normal-file.tmpl:
- Overrides/blog/sidebar.tmpl:
- struct/readme-dir.tmpl: inject::upreadme[9.4ms] (9.5 msec)
- dwiki/standard.tmpl:
- Overrides/blog/all.tmpl:
- struct/datecrumbs.tmpl: blog::datecrumbs blog::prevnext[18ms] (18 msec)
- struct/bottom.tmpl:
- bits/pagetools.tmpl: pagetools[1.6ms] (1.7 msec)
- hist/locker.tmpl: hist::lockedby Empty
- struct/searchbox.tmpl: search::enter
- struct/loginbox.tmpl: auth::loginbox
- syndication/feeds.tmpl: atom::feeds[1ms] (1.1 msec)
- bits/lastmod.tmpl: lastmodified
- site-sig.tmpl
(Reading this and this will be helpful.)
The rough format is 'N. <template file>: <renderer> ...', interspersed with sub-templates; $var marks variable expansion, and Empty means the template ended up generating nothing (often because it wasn't applicable to the page). Times are omitted if they're less than a millisecond.
At 25 templates and 45 milliseconds, this is a fairly typical DWiki page. The sidebar means that WanderingThoughts has a more complex template cascade than ordinary pages, but frankly I like small modular templates. (A lot of them are only a line or two long.)
(And yes, I know that there are a number of inefficient bits in here.
Figuring out how to speed DWiki up is on the to-do list, although I had
not realized quite how time consuming the blog::prevnext renderer was
before now.)