== The practical cost of forking in Python I spent part of the other day working to speed up an [[SCGI ../programming/SCGIvsFastCGI]] based program, and wound up hitting a vivid illustration of the practical cost of forking in Python. I'll start with the numbers: * 5.3 milliseconds per request when the program forked a child to handle each request. * ~~1.1 milliseconds per request~~ when the forking was stubbed out so each request ran in the main process. Benchmarking was done with Apache's _ab_, running on the same machine (and with only one request at a time, since the non-forking version obviously can't handle concurrent requests). These numbers are pure SCGI overhead; the program had its usual response handler stubbed out to a special null handler that just returned a short hard-coded response, and it was directly connected to [[lighttpd http://www.lighttpd.net/]]. (Some work suggests that most of the remaining 1.1 millisecond is in decoding the request's initial headers; I'm not sure how to speed this up.) Since I have a thread pool package lying around, I hacked the SCGI server up to use it instead of forking; the performance stayed around 1.1 milliseconds per request, somewhat to my surprise. I don't have any explanation of why Python takes 4.2 milliseconds more when I fork for each request. The direct cost of _fork()_ with all of the program's modules imported is about 1.3 milliseconds (the [[fork tax ../programming/DynamicLinkingTax]] varies with how many dynamic libraries the Python process has loaded, so it's important to measure with your program's actual set of imports). Forking does require extra management code to do things like track and reap dead children, but 2.9 milliseconds seems a bit high for it.