2006-12-10
An irony of web serving
One of the small paradoxes of the web is that it is often the connections with the least bandwidth that put the largest load on your web server.
This is because each connection consumes a certain amount of server resources, ranging from kernel data structures for socket buffers up to an entire thread or process on a dynamic website. The slower someone's connection, the longer they tie up up this stuff on your end as you slowly feed them data. Conversely, people on fast connections get in, get their data, and get out fast, letting you release those resources.
Among other little effects, this means that it's not enough for load testing to pick a connections per second rate that you should be able to deal with. Ten new connections a second where each client takes a tenth of a second to download its content is rather easier to deal with than ten connections a second where each client takes ten seconds. (And if you test across a local LAN you are far more likely to get the former than the latter.)
There are some ways around parts of this effect:
- web servers based around asynchronous IO generally have far lower
per-connection overhead, which is one reason they're so popular.
- reverse proxy web servers (including in a way Apache running CGI
programs) offer you a way of rapidly sucking the content out of
your high-overhead dynamic website system and parking it in a
low-overhead frontend web server while it trickles out to the
slow clients, instead of having the slow clients hold down an
expensive connection directly with the dynamic website bits.
(This only works well if your generated content is small enough to get sucked completely into the frontend, but this is the usual case.)
- some websites just disconnect clients after a certain amount of time, whether or not they are still transferring data. This is most popular for bulk downloads, where it's cheap for the server to start again if (or when) the client reconnects to resume the transfer.
Weekly spam summary on December 9th, 2006
Our SMTP frontend crashed and restarted three times this week, twice on Wednesday around 6pm and the third time today at 3:16pm, so some of our stats are really fragmentary. Still, this week we:
- got 15,036 messages from 272 different IP addresses.
- handled 20,984 sessions from 1,243 different IP addresses.
- received 114,833 connections from at least 33,061 different IP addresses up to Wednesday at 4am, received 92,925 connections from at least 27,804 different IP addresses from Wednesday at 6pm until Saturday at 4am, and received 11,972 connections from at least 5,107 different IP addresses since 3:16pm today.
This appears to make connection volume around the same as last week. For the days that we have decent per-day stats, connections are running around 38,000 to 40,000 connections a day, with around 10,000 to 11,000 different IP addresses added per day.
Kernel level packet filtering top ten:
Host/Mask Packets Bytes 213.29.7.0/24 28527 1712K 213.4.149.12 16263 846K 69.64.75.166 15633 938K 64.166.14.222 12648 607K 194.105.128.205 6435 386K 81.92.112.2 4551 273K 202.175.95.171 4357 261K 63.162.158.16 4176 200K 202.44.165.9 3892 187K 193.252.22.158 3480 209K
Things are up from last week overall.
- 213.29.7.0/24 is the centrum.cz mail servers, justifying their new permanent block.
- 213.4.149.12 is terra.es, returning from October and many, many previous appearances.
- 69.64.75.166 and 63.162.158.16 kept trying bad
HELO
greetings. - 64.166.14.222 is still a PacBell DSL line.
- 194.105.128.205 and 81.92.112.2 both kept trying to send us stuff that had already tripped spamtraps.
- 202.175.95.171 is in the CBL.
- 202.44.165.9 has invalid reverse DNS and is in APNIC space; we require APNIC IP addresses to have valid reverse DNS.
- 193.252.22.158 is a wanadoo.co.uk machine, which has wound up being in SPEWS again and has appeared here before.
Connection time rejection stats:
58045 total 33557 dynamic IP 19569 bad or no reverse DNS 3319 class bl-cbl 210 class bl-sdul 190 class bl-dsbl 104 class bl-spews 89 class bl-njabl 76 cuttingedgemedia.com 42 class bl-ordb 27 class bl-sbl
This week saw some really prolific connection time rejection sources. 13 of the top 30 most rejected IP addresses were rejected 100 times or more, with the champion being 125.246.18.130 (1,124 times, all in a few minutes around 6pm on December 3rd, with enough activity that it triggered our per IP address maximum connection limits). After that we drop to 64.166.14.222 (201 times), 63.138.101.140 (172 times), and so on.
In other stats, 22 of the top 30 are currently in the CBL, and 5 are
currently in bl.spamcop.net
.
Here's a table of the SBL hits:
Connections | SBL listing | comments |
9 | SBL45324 | a /24 ROKSO listing for Expedite Media |
5 | SBL39631 | a spam source in .cz (listed March 29th) |
4 | SBL47687 | a spam source (listed October 27th) |
2 | SBL48728 | reasonably long-term spam source |
2 | SBL48020 | a /27 ROKSO listing for Howard Minsky (listed November 3rd) |
1 | SBL48694 | a /24 of spammers |
1 | SBL48348 | a /24 ROKSO listing for 'livemercial.com' (listed November 17th) |
1 | SBL46756 | A ROKSO listing for William Stanley; apparently it is an open squid proxy being used by this spammer (listed September 18th) |
1 | SBL41737 | a spammer's mail sending machine (listed May 10th) |
1 | SBL41344 | a /21 listing for a spammer's web hosting (listed May 17th) |
It's interesting that this time around there's not a single advance fee fraud spam source on the list. If I'm really lucky, this means that (SBL-listed) advance fee fraud spam sources are cleaning up their act, but I suspect that it is more likely that spammers are learning to not bother using SBL-listed free webmail systems.
This week, Hotmail had:
- 1 message accepted.
- No messages rejected because they came from non-Hotmail email addresses.
- 27 messages sent to our spamtraps.
- 1 message refused because its sender address had already hit our spamtraps.
- 3 messages refused due to their origin IP address (one from Nigeria, one from the Cote d'Ivoire, and one from Burkina Faso)
And the final numbers:
what | # this week | (distinct IPs) | # last week | (distinct IPs) |
Bad HELO s |
785 | 146 | 1059 | 155 |
Bad bounces | 109 | 95 | 109 | 101 |
The first_last female Slavic names staged a huge comeback this week, although they don't quite make up a majority of the bad bounces; it looks like that honor is captured by the random alphanumeric jumble login names.
(One amusement in all of these stats is watching a single first name be associated with a run of last names, for example 'alisa petrova', then polkyakova, then osipova. I suspect that this is just brute force table merging of first names and last names.)