Implementing a preforking network server in Python

August 6, 2007

I recently read a great explanation of the general Apache 2 preforking server model, and found it so lucid and simple that I was immediately inspired to implement a generic preforking server myself. Their approach has a bit of overhead in parent/child communication, but it does mean that you do not have to pass file descriptors between processes; all you need is some way for the parent to talk with its children.

(Since Python's standard library doesn't have support for file descriptor passing, the pragmatic advantages of this approach are immense.)

To summarize, the core of this model is that the parent is the only thing that notices when there is a new connection, but it does not do the accept() itself; instead it picks an idle child and commands the child to handle the connection. In implementing my version I ran into some subtleties, so I figure I might as well write them down here for posterity:

  • the server socket must be non-blocking and the entire scheme must be able to cope with accept()s that fail, because this can happen. If the kids ever block in accept(), things start going horribly wrong; at a minimum you descend to a thundering herd situation.

    (Remember to set the newly accept()'d socket back to blocking; there are some platforms where it will inherit the server socket's non-blocking status.)

  • the 'accept a new connection' step has to be synchronous between the parent and the child, and the child must reply with its status only after it has done the accept(). Otherwise you can have a race between the parent returning to its select() and the child pulling the new connection off; if the parent wins the race it will order more than one child to handle the same connection.

    This means that there are only two times kids send asynchronous status messages to the parent: their initial 'I am idle' message on startup, and the messages they send after they've completed processing a connection.

  • the parent should deal with status reports from kids before trying to dispatch a new pending connection; this maximizes the chance of knowing that you have idle kids.

  • I found that I needed a status code from the child to the parent to signal 'the processing function has asked the server to shut down'. This is because the simplest place to put checks for signals to shut down is in the application's 'new connection handler' function, which is only called in kids.

    (An alternate approach is to have a second function that's called every time through the parent's main loop.)

  • because I was concerned about resilience, I added timeouts for synchronous communication (for example, commanding a child to accept a new connection and getting a status reply from it). If the timeout expires without a good answer, the parent assumes that something horrible has gone wrong and kills the child.

  • asynchronously starting kids (for example to maintain a minimum pool of idle workers) raises the issue of dealing with kids that don't start properly for some reason. My approach is to put such kids on a list until they report their initial 'idle' status. If the parent needs to dispatch a new connection and there are no idle kids, it picks a pending kid and synchronously waits for its status report; if this times out it kills the kid and tries again.

  • there are two ways of handling too many connections. The graceful way is that if the parent cannot dispatch a new connection to a kid, it stops checking the server socket until it has at least one idle kid; otherwise the parent will just continually spin between select() and failing to dispatch the new connection.

    The abrupt way to handle overload is to have the parent accept() and then immediately close the new connection if it cannot dispatch the new connection to a kid.

While the original description uses socketpairs to communicate between the parents and kids, my implementation uses a pair of pipes because Python 2.3.3 doesn't have socket.socketpair() (it only appears in Python 2.4).

I'm quite happy with how relatively easy it was to write an implementation of this approach and the performance of my code. It is now the core of DWiki's SCGI server, and has handily dealt with the performance issues of the previous approach without having any of the drawbacks of the other solutions I've considered.

(And it stood up fine to a recent pounding.)

If anyone is interested, the code is available as prefork.py, unfortunately without unit tests but hopefully with enough documentation to be usable (the comments probably need some revision). It is currently GPL2 because I know the verbiage to slap on a file for that license; I would be happy to redo it under the Python license once I know the right incantation.

Written on 06 August 2007.
« Thinking about more text formatting for DWiki
A surprise with the Provides header in RPM »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Aug 6 22:19:14 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.