How my CGI to CGI/SCGI frontend works
In ExploitingPolymorphicWSGI, I talked about how I use the flexibility of WSGI to run DWiki as either a CGI or a SCGI server by using a small frontend CGI program. Because there are some subtle bits to how this works, I thought I would write down how the CGI frontend works.
The overall logic is:
- try to talk to an existing SCGI daemon.
- otherwise, check the load and try to start a daemon if the load is between a minimum and a maximum, and then talk to it.
- otherwise, if the load is too high, send out an error message about the system being overloaded.
- otherwise,
exec()
the CGI version of DWiki.
There are two tricky bits: starting the daemon and handling errors during conversations with the daemon.
If the CGI gets no communication back from the daemon during the SCGI conversations, it decides that something bad has gone wrong and it sends out the overload error message. It can do this because the CGI and the daemon communicate over Unix domain sockets, which lets the daemon get around the socket listen problem; the daemon doesn't abruptly drop connections just because it's shutting down, so any communication issues are serious problems.
(There is no general way to recover from a communication failure with
the SCGI daemon, because the CGI may have already consumed part of a
POST body and sent it to the daemon. I ran into this exact issue in an
earlier version of the CGI and SCGI daemon, where I did not have a clean
daemon shutdown and the CGI frontend reacted to communication failures
by going on to exec()
the CGI version of DWiki.)
The complicated part of starting the daemon is that under load, several
CGI processes may all decide that they should start a daemon. This would
be bad. To avoid it, CGI processes must obtain a lock (a flock()
on a
synchronization file) before they try to start the daemon, so that only
one can be doing it at once. The full logic is:
- try to get the lock, which may time out
- try to get a connection, because another process might have just finished starting the daemon and released its lock. If you get a connection, you're done.
- if you have the lock but not a connection, this process won the race to be the daemon starter; it forks and execs the SCGI daemon.
- whether or not you have a lock, loop sleeping for the SCGI daemon to actually start accepting connections; this too may time out.
After all of this, you release the lock if you have it (whether or not you successfully got a connection).
Since starting a daemon on a heavily loaded system may take some time, the CGI has to do at least some waiting. It has timeouts just in case, because at some point it is better for things to go down in flames than keep hammering the system.
(I should really track the child PID and kill it if we started the SCGI daemon but failed to get a connection within the timeout interval, since the attempted invariant is that when you release the lock, either the daemon has been started successfully or it is safe for another process to try.)
Although there are other locking methods than flock()
, flock()
has
the useful property that the lock is guaranteed to evaporate if the
process goes away. While I could put locking into the SCGI daemon
itself, it's better to put it into the lightweight CGI that is already
running than a relatively heavyweight Python program that would have to
be started.
(Looking at the code, I see that the SCGI daemon is inheriting the
flock()
file descriptor. I should probably fix that.)
|
|