2006-03-04
os.walk
can be surprisingly slow
Yesterday's tracing made one particular bit of DWiki's page rendering pop out at me; almost 40% of the time for the page was going just to create the '(Previous | Next)' links at the bottom.
Because DWiki is entirely filesystem based, its 'metadata' is the state of the filesystem. Thus, creating the previous and next links requires walking WanderingThoughts' directory hierarchy to find out the order of the entries. Once I saw the times, I immediately started wondering if I was doing this traversal as fast as I could be.
The natural way to do directory traversal is with the os module os.walk
generator. DWiki's
traversal needed to generate both filenames and their modification
times, so the core of the code was roughly:
for rt, ds, fs in os.walk(dpath): for f in fs: fn = os.path.join(rt, f) yield os.getmtime(fn), fn
I benchmarked this against three alternate versions: a recursive list
version, an iterative list version, and an iterative generator version.
All of them are a bit less than twice as fast as the os.walk
version,
with the generator version slightly faster than the others. (To my
surprise.)
(Generator-ness or lack thereof isn't important for DWiki, and some other requirements make the iterative version simpler. In my test program the generator versions are converted to lists as part of the timing.)
Further experimentation showed that the os.getmtime
call accounted for
about 30% of the time of the os.walk
version. Apparently in Python,
stat()
isn't as speedy as I might have hoped. Thus the major speedup
in my alternate versions is that they can stat()
each entry only once;
the os.walk
version has to stat each twice, once in os.walk
to see if
it's a directory and once in my code to get the modification time.
Doing os.walk
without returning the modification time still wasn't
as fast as my alternate versions, although it was much closer. I suspect
that the remaining slowdown is because os.walk
is a recursive
generator, which means it has to trickle results up through several
levels, and I add another level of yield
on top.
Then I spent the rest of today's hacking time coding a neat scheme for pre-parsing DWiki's templates to speed them up, only to find out that it was slower than the current brute force template processing. That's sometimes how it goes with performance tuning; your best improvements are from ten minute hacks, and your hour long nice work gets discarded in the end.