== _os.walk_ can be surprisingly slow [[Yesterday's tracing VisualizingProgramFlow]] made one particular bit of DWiki's page rendering pop out at me; almost 40% of the time for the page was going just to create the '(Previous | Next)' links at the bottom. Because DWiki is entirely filesystem based, its 'metadata' is the state of the filesystem. Thus, creating the previous and next links requires walking WanderingThoughts' directory hierarchy to find out the order of the entries. Once I saw the times, I immediately started wondering if I was doing this traversal as fast as I could be. The natural way to do directory traversal is with the [[os module http://docs.python.org/lib/module-os.html]] _os.walk_ generator. DWiki's traversal needed to generate both filenames and their modification times, so the core of the code was roughly: > for rt, ds, fs in os.walk(dpath): > for f in fs: > fn = os.path.join(rt, f) > yield os.getmtime(fn), fn I benchmarked this against three alternate versions: a recursive list version, an iterative list version, and an iterative generator version. All of them are a bit less than twice as fast as the _os.walk_ version, with the generator version slightly faster than the others. (To my surprise.) (Generator-ness or lack thereof isn't important for DWiki, and some other requirements make the iterative version simpler. In my test program the generator versions are converted to lists as part of the timing.) Further experimentation showed that the _os.getmtime_ call accounted for about 30% of the time of the _os.walk_ version. Apparently in Python, _stat()_ isn't as speedy as I might have hoped. Thus the major speedup in my alternate versions is that they can _stat()_ each entry only once; the _os.walk_ version has to stat each twice, once in _os.walk_ to see if it's a directory and once in my code to get the modification time. Doing _os.walk_ without returning the modification time *still* wasn't as fast as my alternate versions, although it was much closer. I suspect that the remaining slowdown is because _os.walk_ is a *recursive generator*, which means it has to [[trickle results up through several levels GeneratorGotchas]], and I add another level of _yield_ on top. Then I spent the rest of today's hacking time coding a neat scheme for pre-parsing DWiki's templates to speed them up, only to find out that it was slower than the current brute force template processing. That's sometimes how it goes with performance tuning; your best improvements are from ten minute hacks, and your hour long nice work gets discarded in the end.