os.walk can be surprisingly slow
Because DWiki is entirely filesystem based, its 'metadata' is the state of the filesystem. Thus, creating the previous and next links requires walking WanderingThoughts' directory hierarchy to find out the order of the entries. Once I saw the times, I immediately started wondering if I was doing this traversal as fast as I could be.
The natural way to do directory traversal is with the os module
os.walk generator. DWiki's
traversal needed to generate both filenames and their modification
times, so the core of the code was roughly:
for rt, ds, fs in os.walk(dpath): for f in fs: fn = os.path.join(rt, f) yield os.getmtime(fn), fn
I benchmarked this against three alternate versions: a recursive list
version, an iterative list version, and an iterative generator version.
All of them are a bit less than twice as fast as the
with the generator version slightly faster than the others. (To my
(Generator-ness or lack thereof isn't important for DWiki, and some other requirements make the iterative version simpler. In my test program the generator versions are converted to lists as part of the timing.)
Further experimentation showed that the
os.getmtime call accounted for
about 30% of the time of the
os.walk version. Apparently in Python,
stat() isn't as speedy as I might have hoped. Thus the major speedup
in my alternate versions is that they can
stat() each entry only once;
os.walk version has to stat each twice, once in
os.walk to see if
it's a directory and once in my code to get the modification time.
os.walk without returning the modification time still wasn't
as fast as my alternate versions, although it was much closer. I suspect
that the remaining slowdown is because
os.walk is a recursive
generator, which means it has to trickle results up through several
levels, and I add another level of
yield on top.
Then I spent the rest of today's hacking time coding a neat scheme for pre-parsing DWiki's templates to speed them up, only to find out that it was slower than the current brute force template processing. That's sometimes how it goes with performance tuning; your best improvements are from ten minute hacks, and your hour long nice work gets discarded in the end.