The practical cost of forking in Python
I spent part of the other day working to speed up an SCGI based program, and wound up hitting a vivid illustration of the practical cost of forking in Python. I'll start with the numbers:
- 5.3 milliseconds per request when the program forked a child to handle each request.
- 1.1 milliseconds per request when the forking was stubbed out so each request ran in the main process.
Benchmarking was done with Apache's
ab, running on the same machine
(and with only one request at a time, since the non-forking version
obviously can't handle concurrent requests).
These numbers are pure SCGI overhead; the program had its usual response handler stubbed out to a special null handler that just returned a short hard-coded response, and it was directly connected to lighttpd. (Some work suggests that most of the remaining 1.1 millisecond is in decoding the request's initial headers; I'm not sure how to speed this up.)
Since I have a thread pool package lying around, I hacked the SCGI server up to use it instead of forking; the performance stayed around 1.1 milliseconds per request, somewhat to my surprise.
I don't have any explanation of why Python takes 4.2 milliseconds more
when I fork for each request. The direct cost of
fork() with all of
the program's modules imported is about 1.3 milliseconds (the fork
tax varies with how many dynamic
libraries the Python process has loaded, so it's important to measure
with your program's actual set of imports). Forking does require extra
management code to do things like track and reap dead children, but 2.9
milliseconds seems a bit high for it.