os.walk can be surprisingly slow

March 4, 2006

Yesterday's tracing made one particular bit of DWiki's page rendering pop out at me; almost 40% of the time for the page was going just to create the '(Previous | Next)' links at the bottom.

Because DWiki is entirely filesystem based, its 'metadata' is the state of the filesystem. Thus, creating the previous and next links requires walking WanderingThoughts' directory hierarchy to find out the order of the entries. Once I saw the times, I immediately started wondering if I was doing this traversal as fast as I could be.

The natural way to do directory traversal is with the os module os.walk generator. DWiki's traversal needed to generate both filenames and their modification times, so the core of the code was roughly:

for rt, ds, fs in os.walk(dpath):
  for f in fs:
    fn = os.path.join(rt, f)
    yield os.getmtime(fn), fn

I benchmarked this against three alternate versions: a recursive list version, an iterative list version, and an iterative generator version. All of them are a bit less than twice as fast as the os.walk version, with the generator version slightly faster than the others. (To my surprise.)

(Generator-ness or lack thereof isn't important for DWiki, and some other requirements make the iterative version simpler. In my test program the generator versions are converted to lists as part of the timing.)

Further experimentation showed that the os.getmtime call accounted for about 30% of the time of the os.walk version. Apparently in Python, stat() isn't as speedy as I might have hoped. Thus the major speedup in my alternate versions is that they can stat() each entry only once; the os.walk version has to stat each twice, once in os.walk to see if it's a directory and once in my code to get the modification time.

Doing os.walk without returning the modification time still wasn't as fast as my alternate versions, although it was much closer. I suspect that the remaining slowdown is because os.walk is a recursive generator, which means it has to trickle results up through several levels, and I add another level of yield on top.

Then I spent the rest of today's hacking time coding a neat scheme for pre-parsing DWiki's templates to speed them up, only to find out that it was slower than the current brute force template processing. That's sometimes how it goes with performance tuning; your best improvements are from ten minute hacks, and your hour long nice work gets discarded in the end.

Written on 04 March 2006.
« Visualizing dynamic program flow
Weekly spam summary on March 4th, 2006 »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Mar 4 03:09:22 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.