Wandering Thoughts archives

2007-07-27

How big is the Slashdot effect?

Recent events have made me more curious than ever about just how big the Slashdot effect is. If you get linked to by one of the popular sites like Slashdot, Digg, or reddit, how much load are you actually going to get hit with?

(Correspondingly, if you are trying to make sure a web application can stand up to a Slashdotting, how much load do you need to be able to handle?)

Unfortunately, good current data is hard to come by; most of the reports on the details of the effect are old, and most of them don't include the numbers I'm most interested in. Most people only report their total hits over a relatively long time, but the numbers that matter for performance tuning are the peak concurrency and peak hits per minute or per second.

The best recent sources I've dug up so far are here (from 2006) and here (from early 2007). They suggest that the Slashdot effect is good for a few hundred pageviews a minute over your busiest hour or so, but don't say anything about the peak; other reports suggest between 25 and 50 requests per second (1500 to 3000 a minute) at peak, but no one seems to have a solid number.

(Remember that a single pageview may result in more than one HTTP request, since it needs to get all of the graphics, stylesheets, JavaScript, and so on.)

Another useful tidbit to know about the Slashdot effect would be how much extra traffic your other pages will see. (My guess is that almost all of the traffic will go to the page that got linked and only a few visitors will go look at the rest of your site.)

PS: if I ever get hit with the Slashdot effect, I promise a full report. Not that I expect it to happen. (Yesterday is the closest I've come to that sort of thing; although a couple of entries did appear briefly on programming.reddit.com, it was without serious effects on my traffic levels.)

SlashdotEffectSize written at 23:24:09; Add Comment

2007-07-26

An unexpected performance stress test for DWiki

This morning, I got to see how well DWiki could stand up to high load. While getting attacked by something that has all the signs of a spambot gone berserk doesn't exactly make me happy, I was cheered to see that DWiki stood up to the situation.

(And there's a little bit of me that's dancing around in triumph that all of the work that I've done to prepare for this finally paid off and actually worked out for real.)

Over the incident DWiki averaged about 12 requests a second, with a peak that was probably around 31 requests a second. During this the machine was not overwhelmed (the load average was somewhat over 4 and interactive response was good), no requests got overload errors, and DWiki had spare capacity to handle other requests. Admittedly, by large site standards this is not very impressive; even 31 requests a second gives you 32 milliseconds per request, assuming you get no concurrency.

One of the reasons DWiki stood up so well is that I recently rewrote my SCGI server to use preforking instead of forking for every new connection (which has performance problems in Python). The old SCGI server probably would have done worse, and significantly loaded down the machine in the process.

(Coincidentally, I just got around to applying some pending performance tweaks yesterday.)

In a way we got lucky, because I happened to log in and notice what was going on shortly after the incident started. Since we usually average only 10,000 requests a day, the incident could have multiplied the size of our Apache logfiles significantly if it had run for very long.

Sidebar: more information about the incident

The basics: the IP address 206.51.229.119 made 6,896 requests between 08:24:41 and 08:34:22 (when I blocked it), giving it an average of 11.87 requests a second; in the most active second it issued 31 requests. Unfortunately I don't have a number for how many concurrent requests DWiki was handling; the basic Apache log format doesn't have enough information, since it only logs the start time of each request.

(My best guess is that the concurrency was relatively low by the end, because while I was watching the prefork SCGI server was only running the minimum number of 4 worker processes.)

We sent 206.51.229.119 a total of 96 megabytes of output (most of which it probably ignored). It hit only 1882 different URLs, hitting the most popular one 266 different times (but the next most popular one only 134 times).

One of the reasons I am pretty sure it was up to no good is that it used a User-Agent string with User-Agent: in it, which is something that I've seen before.

UnexpectedLoadTest written at 23:17:12; Add Comment

2007-07-16

Why SSL and name-based virtual hosts don't get along

Part of validating a SSL certificate is making sure that it is a SSL certificate for what you are actually connecting to, to avoid the possibility of a man in the middle attack. SSL certificates for websites have a field (the CN portion of the 'Distinguished Name') that names the host they are for, and so target validation consists of checking that the certificate's CN is the same name as the host portion of the URL.

This checking is pretty literally a string compare; the web browser does not do anything like checking to see whether the CN host and the hostname in the URL map to the same IP address. (Okay, the string comparison does DNS case folding. I don't know if it does IDNA folding so that CN names can be in native character sets, but I suspect not).

The problem for name-based virtual hosting is that the SSL certificate exchange happens immediately after the https connection is made, before the client sends any HTTP headers, including the Host: header that would tell the server what virtual host it is trying to connect to and thus which SSL certificate the server should use. The only things the server knows when it's picking what certificate to use is what IP address and port the client is connecting to, so if you want a single web server to give out multiple SSL certificates you have to use multiple IP addresses (or multiple ports).

The one exception to this is a wildcarded SSL certificates and subdomains. You can get a SSL certificate for *.example.org, and then you can use name-based virtual hosting for multiple <something>.example.org websites, because all of their hostnames will match the certificate's CN. The drawback is that such certificates can be hard to get and generally cost a bunch more money than regular certificates (because the certificate authority is losing out on all that juicy cash they would get by making you buy a certificate per virtual host).

SSLNameProblem written at 21:31:53; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.