My mistake with the
Host: HTTP header
One of the nice things about writing a blog is getting to say 'oh, oops, I was a dumbass, let me fix that'. Today I have to own up to a big example of this.
In theory the absolute URL should include the port (unless it's the default). In practice, every program I've tried gleefully adds the port itself if it is a non-standard port and you're referring to the same hostname.
I was a moron.
Host: header in HTTP requests includes the port when the port is
a non-standard one (and some programs throw it in even when you're on
port 80, as I found out later). My code looked more or less like:
newuri = "http://%s:%d" % (HostHeader, MyPort) + relUrl
When programs gave me real
Host: headers, where
both hostname and port, I effectively doubled the port and things
naturally exploded. Had I printed the actual
Host: header that
programs were handing DWiki I would have seen my mistake immediately,
but instead I was too confidant that I knew what was going on and didn't
bother; I trusted my testing with hand-crafted HTTP requests, where I'd
Host: header wrong and so the result looked right.
I only found all this out months later when I was doing something else
Host: header that blew up because I didn't know to expect the
':port' on the end; that time I dumped debugging information, partly
because the failure was more mysterious.
My mistake is all the more embarrassing because, contrary to what I wrote in the original entry, the proper behavior is described in black and white in the HTTP 1.1 RFC's section on the Host header. I am not sure what RFCs I read at the time of the original entry, but evidently I didn't read the important one.
Weekly spam summary on May 20th, 2006
This week we:
- got 12,292 messages from 221 different IP addresses.
- handled 16,875 sessions from 807 different IP addresses.
- received 125,999 connections from at least 41,642 different IP addresses.
- hit a highwater of 11 connections being checked at once.
Nothing went wrong this week, thank goodness; no reboots, no SMTP frontend restarts, nothing. Weekly volume seems to be back to the normal level when things are quiet; there's no sign of last week's Sunday spike. The per-day statistics are sufficiently boring and flat (peaking at 20,000 connections on Wednesday) that I'm not going to put them in.
Kernel level packet filtering top ten:
Host/Mask Packets Bytes 184.108.40.206 11876 570K 220.127.116.11 4672 224K 18.104.22.168/24 4390 219K 22.214.171.124/10 3781 190K 126.96.36.199 2925 149K 188.8.131.52/11 2583 131K 184.108.40.206/11 2449 122K 220.127.116.11/12 2069 104K 18.104.22.168 2027 94761 22.214.171.124/13 1909 94116
This is very similar to last week's numbers, down to the first place finisher.
- 126.96.36.199 returns from last week.
- 188.8.131.52 is on the DSBL.
- 184.108.40.206 and 220.127.116.11 are both 'dialup' machines as far as we can tell from their generic DNS names.
Connection time rejection stats:
35861 total 17407 dynamic IP 14992 bad or no reverse DNS 2390 class bl-cbl 278 class bl-dsbl 135 class bl-sdul 81 class bl-njabl 69 class bl-sbl 63 class bl-ordb
Out of curiosity, I took a look at the SBL rejections; the results are kind of depressing. The 69 rejections were of 13 different IP addresses; only two IP addresses (5 rejections total) were not listed for being advance fee fraud sources.
Twelve out of the top 30 most rejected IP addresses were rejected more
than 100 times; the top rejection source was our friend 18.104.22.168
(497 times before it was re-blocked at the kernel level). 26 of the top
30 most rejected IP addresses are currently in the CBL; six of them are
Hotmail is backsliding; perhaps I should be surprised. This week's stats:
- 1 message accepted, which was spam (I know, because I got it).
- 1 message rejected because it came from a non-Hotmail email address.
- 10 messages sent to our spamtraps.
- no messages refused because their sender addresses had already hit our spamtraps.
- 1 message refused due to its origin IP address being in the CBL.
The last set of numbers:
|what||# this week||(distinct IPs)||# last week||(distinct IPs)|
Oh well, so much for not getting very many bounces. (I suppose
this still qualifies by other people's standards). As with last
week, (just) over half the bad
HELOs came from 22.214.171.124/24,
btconnect.com's outgoing SMTP server pool. The odds of this changing
any time soon seems low.