Our central web server, Apache, and slow downloads

July 13, 2016

Recently, our monitoring system alerted us that our central web server wasn't responding. This has happened before and so I was able to immediately use our mod_status URL to see that yep, all of the workers were busy. This time around it wasn't from a popular but slow CGI or a single IP address; instead, a bunch of people were downloading PDFs of slides for a course here. Slowly, apparently (although the PDFs aren't all that big).

This time around I took the simple approach to deal with the problem; I increased the maximum number of workers by a bunch. This is obviously not an ideal solution, as we're using the prefork MPM and so more workers means more processes which means more memory used up (and potentially more thrashing in the kernel for various things). In our specific situation I figured that this would have relatively low impact, as worker processes that are just handling static file transfers to slow clients don't need many resources.

A high maximum workers setting is dangerous in general, though. With a different access pattern, the number of workers we have configured right now could easily kill the entire server (which would have a significant impact on a whole lot of people). Serving static files to a whole lot of slow clients is not a new problem (in fact it's a classic problem), but Apache has traditionally not had any clever way to handle it.

Modern versions of Apache have the event MPM, which its documentation claims is supposed to deal reasonably with this situation. We're using the prefork MPM, but I don't think we've carefully evaluated our choice here; it's just both the Ubuntu default and the historical behavior. We may want to reconsider this, as I don't think there's any reason we couldn't switch away from prefork (we don't enable PHP in the central web server).

(Per this serverfault question and answer, in the ever increasing world of HTTPS the event MPM is basically equivalent to the worker MPM. However, our central web server is still mostly HTTP, so could benefit here, and even the worker MPM is apparently better than prefork for avoiding memory load and so on.)

PS: There are potential interactions between NFS IO stalls and the choice of MPM here, but in practice in our environment any substantial NFS problem rapidly causes the web server to grind to a halt in general. If it grinds to a halt a little bit faster with the event or worker MPM than with the prefork one, this is not a big deal.

Written on 13 July 2016.
« How we do MIME attachment type logging with Exim
Your C compiler's optimizer can make your bad programs compile »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jul 13 01:33:31 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.