If you're going to use PyPy, I think you need servers

July 25, 2017

I have a long-standing interest in PyPy for the straightforward reason that it certainly would be nice to get a nice performance increase for my Python code basically for free, and I do have some code that is at least somewhat CPU-intensive. Also, to be honest, the idea and technology of PyPy is really neat and so I would like it to work out.

Back a few years ago when I did some experiments, one of the drawbacks of PyPy for my sort of interests was that it took a substantial amount of execution time to warm up and start performing better than CPython. I just gave the latest PyPy release a quick spin (using this standalone package for Linux (via)), and while it's faster than previous versions it still has that warm-up requirement, which is neither unexpected nor surprising (and in fact the PyPy FAQ explicitly talks about this). But this raises a question; if I want to use PyPy to speed up my Python code, what would it take?

If PyPy only helps on long running code, then that means I need to run things as servers instead of one-shot programs. This is doable; almost anything can be recast as a server if you try hard enough (and perhaps write the client in another, lighter weight language). However it's not enough to just have, say, a preforking server where the actual worker processes only do a bit of work and then die off, because that doesn't get you the long running code that PyPy needs. Instead you need either long running worker processes or threads within a single server process, and given Python's GIL you probably want the former.

(And yes, PyPy still has a GIL.)

A straightforward preforking server is going to duplicate a lot of warm-up work in each worker process, because the main server process doesn't do very much work on its own before it starts worker processes. I can imagine hacks to deal with this, such as having the server go through a bunch of synthetic requests before it starts forking off workers to handle real ones. This might have the useful side effect of reducing the overall memory overhead of PyPy by sharing more JIT data between worker processes. It does require you to generate synthetic requests, which is easy for me in one environment but not so much so for another.

There is one obvious server environment that's an entirely natural fit for running Python code easily, and would in fact easily handle DWiki (the code behind this blog). That is Apache with mod_wsgi, which transparently runs your Python WSGI app in some server processes. Unfortunately, as far as I know mod_wsgi doesn't support PyPy and I don't think there are any plans to change that.

(There are other ways to run WSGI apps using PyPy, but none of them are as easy and seamless as Apache with mod_wsgi and thus all of them are less interesting to me.)

Written on 25 July 2017.
« Trying to understand the ZFS l2arc_noprefetch tunable
Why I care about Apache's mod_wsgi so much »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jul 25 00:06:05 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.