I've been hit by the startup overhead of small programs in Python
I've written before about how I care about the resource usage and speed of short running programs. However, that care has basically been theoretical. I knew this was an issue in general and it worried me because we have short running Python programs, but it didn't impact me directly and our systems didn't seem to be suffering as a result of it. Even DWiki running as a CGI was merely kind of embarrassing.
Today, I turned a hacky personal shell script into a better done
production ready version that I rewrote in Python. This worked fine
and everything was great right up to the point where I discovered
that I had made this script a critical path in invoking
dmenu on my office workstation, which is something
that I do a lot (partly because I have a very convenient key
binding for it). The new Python version
is not slow as such, but it is slower, and it turns out that I am
very sensitive to even moderate startup delays with
because I type ahead, expecting
dmenu to appear essentially
instantly). With the old shell script version, this part of
startup took around one to two hundredths of a second; with the new
Python version, things now takes around a quarter of a second, which
is enough lag to be perceptible and for my type-ahead to go awry.
(This assumes that my machine is unloaded, which is not always the
case. Active CPU load, such as installing Ubuntu in a test VM, can
make this worse. My
dmenu setup actually runs this program five
times to extract various information, so each individual run is
taking about five hundredths of a second.)
Profiling and measuring short running Python programs is a bit
challenging and I've wound up resorting to fairly crude tricks (such
as just exiting from the program at strategic points). These tricks
strongly suggest that almost all of the extra time is going simply
to starting Python, with a significant amount of it spent importing
the standard library modules I use (and all of the things that they
import in turn). Simply getting to the quite early point where I
parse_args ArgumentParser method consumes almost all of the
time on my desktop. My own code contributes relatively little to
the slower execution (although not nothing), which unfortunately
means that there's basically no point in trying to optimize it.
(On the one hand, this saves me time. On the other hand, optimizing Python code can be interesting.)
My inelegant workaround for now is to cache the information my program is producing, so I only have to run the program (and take the quarter second delay) when its configuration file changes; this seems to work okay and it's as least as fast as the old shell script version. I'm hopeful that I won't run into any other places where I'm using this program in a latency sensitive situation (and anyway, such situations are likely to have less latency since I'm probably only running it once).
In the longer run it would be nice to have some relatively general solution to pre-translate Python programs into some faster to start form. For my purposes with short running programs it's okay if the translated result has somewhat less efficient code, as long as it starts very fast and thus finishes fast for programs that only run briefly. The sort of obvious candidate is Google's grumpy project; unfortunately, I can't figure out how to make it convert and build programs instead of Python modules, although it's clearly possible somehow.
PS: The new version of the program is written in Python instead of shell because a non-hacky version of the job it's doing is more complicated than is sensible to implement in a shell script (it involves reading a configuration file, among other issues). It's written in Python instead of Go for multiple reasons, including that we've decided to standardize on only using a few languages for our tools and Go currently isn't one of them (I've mentioned this in a comment on this entry).