The problem with preforking Python network servers

May 12, 2006

I've been thinking about ways around the practical cost of forking in Python. There's two common alternatives: preforking servers and threads in general. However, both of them have issues that make me unhappy with them.

The best setup for a preforking SCGI server is a central dispatcher that parcels new connections out to a pool of worker processes; this requires the ability to pass file descriptors to other processes. While Unix can do this (with SCM_RIGHTS messages over Unix domain sockets), Python doesn't support this part of the Unix sockets API.

This leaves you with the preforked workers all sitting around waiting in select() for a new SCGI connection or instructions from the master process (such as 'please exit now'). When a new SCGI connection comes in, all of them wake up in a thundering herd; one of them wins the race to accept() the new connection and everyone else goes back to select() to wait. The more worker processes, the bigger the herd.

Pragmatically the thundering herd issue is unlikely to be noticed on a modern computer, partly because you don't want to run that many worker processes anyways. But its mere existence annoys me, and the lack of a central dispatcher means that you have to pre-start all the workers and can't start and stop them based on connection flux. (This has a silver lining: just starting a fixed number of workers and keeping them running is less code.)

I may still code a preforking version of the SCGI server just to see how it goes and for the experience, but I suspect I'm not going to run it in production. Systems speed up, but unappetizing code is forever.

The problems with threads

There are several annoyances with threads:

  • I'd lose process isolation, so a code bug could rapidly contaminate the entire SCGI server.
  • This isn't a good match for Python threads because my SCGI server is mostly CPU bound.
  • due to the Linux NPTL thread issue the process would use up a lot of virtual memory, and it just makes me twitchy to see my SCGI server sitting around using many megabytes of virtual memory.

I could do a threaded or thread-pool based SCGI server, but I'd be left with the feeling that it was a big hack. It'd barely be a step up from a single-threaded server that only handled one connection at a time. (There's some disk IO and network IO that multiple threads might be able to take advantage, but probably not too much. Unfortunately measuring true parallelism opportunities is a bit tricky.)

Written on 12 May 2006.
« Building a boot floppy for BIOS flashing
Safely updating Unix files »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri May 12 02:05:59 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.