An unexpected performance stress test for DWiki
This morning, I got to see how well DWiki could stand up to high load. While getting attacked by something that has all the signs of a spambot gone berserk doesn't exactly make me happy, I was cheered to see that DWiki stood up to the situation.
(And there's a little bit of me that's dancing around in triumph that all of the work that I've done to prepare for this finally paid off and actually worked out for real.)
Over the incident DWiki averaged about 12 requests a second, with a peak that was probably around 31 requests a second. During this the machine was not overwhelmed (the load average was somewhat over 4 and interactive response was good), no requests got overload errors, and DWiki had spare capacity to handle other requests. Admittedly, by large site standards this is not very impressive; even 31 requests a second gives you 32 milliseconds per request, assuming you get no concurrency.
One of the reasons DWiki stood up so well is that I recently rewrote my SCGI server to use preforking instead of forking for every new connection (which has performance problems in Python). The old SCGI server probably would have done worse, and significantly loaded down the machine in the process.
(Coincidentally, I just got around to applying some pending performance tweaks yesterday.)
In a way we got lucky, because I happened to log in and notice what was going on shortly after the incident started. Since we usually average only 10,000 requests a day, the incident could have multiplied the size of our Apache logfiles significantly if it had run for very long.
Sidebar: more information about the incident
The basics: the IP address 18.104.22.168 made 6,896 requests between 08:24:41 and 08:34:22 (when I blocked it), giving it an average of 11.87 requests a second; in the most active second it issued 31 requests. Unfortunately I don't have a number for how many concurrent requests DWiki was handling; the basic Apache log format doesn't have enough information, since it only logs the start time of each request.
(My best guess is that the concurrency was relatively low by the end, because while I was watching the prefork SCGI server was only running the minimum number of 4 worker processes.)
We sent 22.214.171.124 a total of 96 megabytes of output (most of which it probably ignored). It hit only 1882 different URLs, hitting the most popular one 266 different times (but the next most popular one only 134 times).
One of the reasons I am pretty sure it was up to no good is that
it used a User-Agent string with
User-Agent: in it, which is
something that I've seen before.