2017-08-11
Some notes from my brief experience with the Grumpy transpiler for Python
I've been keeping an eye on Google's Grumpy Python to Go transpiler more or less since it was introduced because it's always been my great white hope for speeding up my Python code more or less effortlessly (and I like Go). However, until recently I had never actually tried to do anything much with it because I didn't really have a problem that it looked like a good fit for. What changed is that I finally got hit by the startup overhead of small programs.
As mentioned in that entry, my initial attempts to use Grumpy weren't successful, because how to actually use Grumpy for anything beyond toys is basically not documented today. Because sometimes I'm stubborn, I kept banging my head against the wall for long enough until I hacked together how to bring up my program, which gave me the chance to get some real world results. Basically the process went like this:
- build Grumpy from source following their 'method 2' process (using the Fedora 25 system version of Go, not my own build, because Grumpy very much didn't work with the latter).
- have Grumpy translate my Python program into a module, which
was possible because I'd kept it
import
able. - hack
grumprun
to not delete the Go source file it creates on the fly based on your input.grumprun
is in Python, which makes this reasonably easy. - feed
grumprun
a Python program that was 'import mymodule; mymodule.main()
' and grab the Go source code it generated (now that it wasn't deleting said source code afterward). This gave me a Go program that I could build into a binary that I could keep and then run with command line arguments.
Unfortunately it turns out that this didn't do me any good. First, the
compiled binary of my Grumpy-transpiled Python code also took about
the same 0.05 of a second to start and run as my real Python code.
Second, my code immediately failed because Grumpy has not fully
implemented Python set()
s; in particular, it doesn't have the
.difference()
method. This is not listed in their Missing
features
wiki page, but Grumpy is underdocumented in general.
(As a general note, Grumpy appears to be in a state of significant churn in how it operates and how you use it, which I suppose is not particularly surprising. You can find older articles on how to use Grumpy that clearly worked at the time but don't work any more.)
This whole experience has unfortunately left me much less interested in Grumpy. As it is today, Grumpy's clearly not ready for outside people to do anything with it, and even in the future it may well never be good at the kind of things I want it for. Building fast-starting and fast-running programs may not ever be a Grumpy priority. Grumpy is an interesting experiment and I wish Google the best of luck with it, but it clearly can't be my great hope for faster, lighter-weight Python programs.
My meta-view of Grumpy is that right now it feels like an internal Google (or Youtube) tool that Google just happens to be developing in a public repository for us to watch.
(In this particular case my fix was to hand-write a second version of the program in Go, which has been part irritating and part interesting. The Go version runs in essentially no time, as I wanted and hoped, so the slow startup of the Grumpy version is not intrinsic to either Go or the problem. My Go version will not be the canonical version of this program for local reasons, so I'll have to maintain it myself in sync with the official Python version for as long as I care enough to.)
Sidebar: Part of why Grumpy is probably slow (and awkward)
It's an interesting exercise to look at the Go code that grumpc
generates. It's not anything like Go code as you'd conventionally
write it; instead, it's much closer to CPython bytecode that has been turned into Go code. This
faithfully implements the semantics of (C)Python, which is explicitly
one of Grumpy's goals, but it means that Grumpy has a significant
amount of overhead over a true Go solution in many situations.
(The transpiler may lower some Python types and expressions to more pure Go code under some circumstances, but scanning the generated output for my Python program suggests that this is uncommon to rare in the kind of code I write.)
Grumpy codes various Python types in pure Go
code, but as I found with set
, some of their implementations are
incomplete. In fact, now that I look I can see that the only Go
code in the entire project appears to be in those types, which
generally correspond to things that are implemented in C in CPython.
Everything else is generated by the transpiling process.
2017-08-05
I've been hit by the startup overhead of small programs in Python
I've written before about how I care about the resource usage and speed of short running programs. However, that care has basically been theoretical. I knew this was an issue in general and it worried me because we have short running Python programs, but it didn't impact me directly and our systems didn't seem to be suffering as a result of it. Even DWiki running as a CGI was merely kind of embarrassing.
Today, I turned a hacky personal shell script into a better done
production ready version that I rewrote in Python. This worked fine
and everything was great right up to the point where I discovered
that I had made this script a critical path in invoking dmenu
on my office workstation, which is something
that I do a lot (partly because I have a very convenient key
binding for it). The new Python version
is not slow as such, but it is slower, and it turns out that I am
very sensitive to even moderate startup delays with dmenu
(partly
because I type ahead, expecting dmenu
to appear essentially
instantly). With the old shell script version, this part of dmenu
startup took around one to two hundredths of a second; with the new
Python version, things now takes around a quarter of a second, which
is enough lag to be perceptible and for my type-ahead to go awry.
(This assumes that my machine is unloaded, which is not always the
case. Active CPU load, such as installing Ubuntu in a test VM, can
make this worse. My dmenu
setup actually runs this program five
times to extract various information, so each individual run is
taking about five hundredths of a second.)
Profiling and measuring short running Python programs is a bit
challenging and I've wound up resorting to fairly crude tricks (such
as just exiting from the program at strategic points). These tricks
strongly suggest that almost all of the extra time is going simply
to starting Python, with a significant amount of it spent importing
the standard library modules I use (and all of the things that they
import in turn). Simply getting to the quite early point where I
call argparse's
parse_args
ArgumentParser method consumes almost all of the
time on my desktop. My own code contributes relatively little to
the slower execution (although not nothing), which unfortunately
means that there's basically no point in trying to optimize it.
(On the one hand, this saves me time. On the other hand, optimizing Python code can be interesting.)
My inelegant workaround for now is to cache the information my program is producing, so I only have to run the program (and take the quarter second delay) when its configuration file changes; this seems to work okay and it's as least as fast as the old shell script version. I'm hopeful that I won't run into any other places where I'm using this program in a latency sensitive situation (and anyway, such situations are likely to have less latency since I'm probably only running it once).
In the longer run it would be nice to have some relatively general solution to pre-translate Python programs into some faster to start form. For my purposes with short running programs it's okay if the translated result has somewhat less efficient code, as long as it starts very fast and thus finishes fast for programs that only run briefly. The sort of obvious candidate is Google's grumpy project; unfortunately, I can't figure out how to make it convert and build programs instead of Python modules, although it's clearly possible somehow.
(My impression is that both grumpy and Cython have wound up focused on converting modules, not programs. Like PyPy, they may also be focusing on longer running CPU-intensive code.)
PS: The new version of the program is written in Python instead of shell because a non-hacky version of the job it's doing is more complicated than is sensible to implement in a shell script (it involves reading a configuration file, among other issues). It's written in Python instead of Go for multiple reasons, including that we've decided to standardize on only using a few languages for our tools and Go currently isn't one of them (I've mentioned this in a comment on this entry).