Wandering Thoughts archives

2006-12-10

An irony of web serving

One of the small paradoxes of the web is that it is often the connections with the least bandwidth that put the largest load on your web server.

This is because each connection consumes a certain amount of server resources, ranging from kernel data structures for socket buffers up to an entire thread or process on a dynamic website. The slower someone's connection, the longer they tie up up this stuff on your end as you slowly feed them data. Conversely, people on fast connections get in, get their data, and get out fast, letting you release those resources.

Among other little effects, this means that it's not enough for load testing to pick a connections per second rate that you should be able to deal with. Ten new connections a second where each client takes a tenth of a second to download its content is rather easier to deal with than ten connections a second where each client takes ten seconds. (And if you test across a local LAN you are far more likely to get the former than the latter.)

There are some ways around parts of this effect:

  • web servers based around asynchronous IO generally have far lower per-connection overhead, which is one reason they're so popular.

  • reverse proxy web servers (including in a way Apache running CGI programs) offer you a way of rapidly sucking the content out of your high-overhead dynamic website system and parking it in a low-overhead frontend web server while it trickles out to the slow clients, instead of having the slow clients hold down an expensive connection directly with the dynamic website bits.

    (This only works well if your generated content is small enough to get sucked completely into the frontend, but this is the usual case.)

  • some websites just disconnect clients after a certain amount of time, whether or not they are still transferring data. This is most popular for bulk downloads, where it's cheap for the server to start again if (or when) the client reconnects to resume the transfer.
web/ConnectionSpeedLoad written at 22:20:00;

Weekly spam summary on December 9th, 2006

Our SMTP frontend crashed and restarted three times this week, twice on Wednesday around 6pm and the third time today at 3:16pm, so some of our stats are really fragmentary. Still, this week we:

  • got 15,036 messages from 272 different IP addresses.
  • handled 20,984 sessions from 1,243 different IP addresses.
  • received 114,833 connections from at least 33,061 different IP addresses up to Wednesday at 4am, received 92,925 connections from at least 27,804 different IP addresses from Wednesday at 6pm until Saturday at 4am, and received 11,972 connections from at least 5,107 different IP addresses since 3:16pm today.

This appears to make connection volume around the same as last week. For the days that we have decent per-day stats, connections are running around 38,000 to 40,000 connections a day, with around 10,000 to 11,000 different IP addresses added per day.

Kernel level packet filtering top ten:

Host/Mask           Packets   Bytes
213.29.7.0/24         28527   1712K
213.4.149.12          16263    846K
69.64.75.166          15633    938K
64.166.14.222         12648    607K
194.105.128.205        6435    386K
81.92.112.2            4551    273K
202.175.95.171         4357    261K
63.162.158.16          4176    200K
202.44.165.9           3892    187K
193.252.22.158         3480    209K

Things are up from last week overall.

  • 213.29.7.0/24 is the centrum.cz mail servers, justifying their new permanent block.
  • 213.4.149.12 is terra.es, returning from October and many, many previous appearances.
  • 69.64.75.166 and 63.162.158.16 kept trying bad HELO greetings.
  • 64.166.14.222 is still a PacBell DSL line.
  • 194.105.128.205 and 81.92.112.2 both kept trying to send us stuff that had already tripped spamtraps.
  • 202.175.95.171 is in the CBL.
  • 202.44.165.9 has invalid reverse DNS and is in APNIC space; we require APNIC IP addresses to have valid reverse DNS.
  • 193.252.22.158 is a wanadoo.co.uk machine, which has wound up being in SPEWS again and has appeared here before.

Connection time rejection stats:

  58045 total
  33557 dynamic IP
  19569 bad or no reverse DNS
   3319 class bl-cbl
    210 class bl-sdul
    190 class bl-dsbl
    104 class bl-spews
     89 class bl-njabl
     76 cuttingedgemedia.com
     42 class bl-ordb
     27 class bl-sbl

This week saw some really prolific connection time rejection sources. 13 of the top 30 most rejected IP addresses were rejected 100 times or more, with the champion being 125.246.18.130 (1,124 times, all in a few minutes around 6pm on December 3rd, with enough activity that it triggered our per IP address maximum connection limits). After that we drop to 64.166.14.222 (201 times), 63.138.101.140 (172 times), and so on.

In other stats, 22 of the top 30 are currently in the CBL, and 5 are currently in bl.spamcop.net.

Here's a table of the SBL hits:

Connections SBL listing comments
9 SBL45324 a /24 ROKSO listing for Expedite Media
5 SBL39631 a spam source in .cz (listed March 29th)
4 SBL47687 a spam source (listed October 27th)
2 SBL48728 reasonably long-term spam source
2 SBL48020 a /27 ROKSO listing for Howard Minsky (listed November 3rd)
1 SBL48694 a /24 of spammers
1 SBL48348 a /24 ROKSO listing for 'livemercial.com' (listed November 17th)
1 SBL46756 A ROKSO listing for William Stanley; apparently it is an open squid proxy being used by this spammer (listed September 18th)
1 SBL41737 a spammer's mail sending machine (listed May 10th)
1 SBL41344 a /21 listing for a spammer's web hosting (listed May 17th)

It's interesting that this time around there's not a single advance fee fraud spam source on the list. If I'm really lucky, this means that (SBL-listed) advance fee fraud spam sources are cleaning up their act, but I suspect that it is more likely that spammers are learning to not bother using SBL-listed free webmail systems.

This week, Hotmail had:

  • 1 message accepted.
  • No messages rejected because they came from non-Hotmail email addresses.
  • 27 messages sent to our spamtraps.
  • 1 message refused because its sender address had already hit our spamtraps.
  • 3 messages refused due to their origin IP address (one from Nigeria, one from the Cote d'Ivoire, and one from Burkina Faso)

And the final numbers:

what # this week (distinct IPs) # last week (distinct IPs)
Bad HELOs 785 146 1059 155
Bad bounces 109 95 109 101

The first_last female Slavic names staged a huge comeback this week, although they don't quite make up a majority of the bad bounces; it looks like that honor is captured by the random alphanumeric jumble login names.

(One amusement in all of these stats is watching a single first name be associated with a run of last names, for example 'alisa petrova', then polkyakova, then osipova. I suspect that this is just brute force table merging of first names and last names.)

spam/SpamSummary-2006-12-09 written at 00:31:00;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.