2006-05-14
Absolute versus relative URLs in syndication feeds
I just changed DWiki to generate absolute URLs for the '(N comments)' links at the bottom of entries in my Atom syndication feeds, instead of the absolute path URLs it used to generate (URLs without the http://host/ portion) . This scrubs out the last non-absolute URLs in my syndication feed; URLs in the text of entries in the feed have always been absolute, because I'm cautious and cynical.
In theory absolute URLs are unnecessary in Atom entries, because Atom has rules for how to handle relative URLs. And if you believe that all feed readers properly implement those rules, I have a pony for you. In theory the programmers of the bad feed reader are bozos, because the Atom spec is clear; in practice, even with Atom people creating syndication feeds have a choice between purity and having your feed being widely read. Using only absolute links is one of the aspects of that choice.
(The difference between Atom and RSS in this is that who is the bozo is very clear.)
Back when I added syndication feeds to DWiki I made a choice to be pessimistic about feed readers getting relative URLs right all the time, and modified the DWikiText to HTML converter to generate absolute links for syndication feeds. The '(N comments)' link is generated separately, so I missed it; a problem report today validated my cynicism and pushed me to make this change.
(Depressingly, I believe the feed reader that had a problem was NetNewsWire 2.1; I had expected a bit better of it since it's well regarded.)
Other people feel differently, and deliberately stick to their guns
in order to push the technology forward and so on. For example, Tim
Bray uses fully relative URLs for the
images in his feeds combined with XHTML and xml:base declarations
(themselves relative to his feed's URL); the result is a nice test of
proper XHTML and XML handling in feed readers. (Some fail, liferea included, but this encourages people to get
them fixed.)
Link: an engineering management hack
Engineering Management Hacks: The BigBook Technique is an amusing story of how a group of engineers got their management to pay attention to Brooks's Law ("Adding manpower to a late software project makes it later"). I won't spoil the punchline; read it yourself.
Around here we don't have problems with Brooks's Law, perhaps because we don't have the extra manpower to add to late projects to start with.
(From Daring Fireball.)
A small user interface suggestion
Your button for 'mark all items as read' should not be right next to the button for 'advance to next unread item'. Liferea, I'm looking at you.
(Fortunately it was not a feed with a lot of unread or updated items. I think.)
Weekly spam summary on May 13th, 2006
Unfortunately, the SMTP frontend died shortly after midnight on Tuesday morning, so some of the connection statistics are missing about 2.6 days. Given that, this week we:
- got 11,652 messages from 229 different IP addresses.
- handled 16,296 sessions from 808 different IP addresses.
- received 110,313 connections from at least 35,408 different IP addresses since early Tuesday morning.
- hit a highwater of 11 connections being checked at once since early Tuesday morning.
At the Monday morning volume timestamp, we had received 210,731 connections from at least 7,733 different IP addresses; from this I suspect that that spam storm from Saturday of last week continued full-bore on last Sunday.
Kernel level packet filtering top ten:
Host/Mask Packets Bytes 218.254.83.47 13422 644K 212.216.176.0/24 6112 305K 209.91.186.139 4554 273K 61.128.0.0/10 3484 173K 68.147.8.249 3397 163K 221.216.0.0/13 3047 151K 218.0.0.0/11 2692 137K 220.160.0.0/11 2358 118K 74.0.215.4 2309 117K 68.167.80.52 2132 99671
Overall, this is a bit more active than last week, but it's mostly driven up by a few people; there seems to have been no overall volume surge.
- 218.254.83.47 is a Hong Kong cablemodem, and was mentioned in passing last week.
- 209.91.186.139 is in the CBL. (And Canadian, alas.)
- 68.147.8.249 in in a Shaw Cable SPEWS listing. I've actually seen it in log summaries for previous weeks (although never high enough to get in this report), and it has a good looking DNS name, and it's not listed anywhere else, so I am going to whitelist it and see what happens.
- 74.0.215.4 is a covad.net 'dialup' machine.
- 68.167.80.52 returns from this April;
we consider it a dialup machine, and it's also in
bl.spamcop.netand the DSBL.
Connection time rejection stats:
40201 total
19942 dynamic IP
16960 bad or no reverse DNS
2033 class bl-cbl
233 class bl-spews
119 class bl-sdul
118 class bl-dsbl
83 class bl-sbl
49 class bl-ordb
19 class bl-njabl
3 class bl-opm
Although this looks down from last week, the details make Sunday's
spam storm pop out. All 30 of the top 30 most rejected IP addresses
were rejected more than 100 times; the most active one was our friend
218.254.83.47, with 619. 27 of the top 30 are currently in the CBL, 4
are currently in bl.spamcop.net, and 222.252.50.91 (123 rejections) is
in SBL39408.
SBL39408 is one of those depressing SBL listings; it is for 222.252.0.0/15, which belongs to Vietnam Posts and Telecommunications Corp (VNN.VN). Created April 10th 2006, the two /16 halves of it are apparently the current worst and second worst /16 spam source networks on the Internet. Somehow I suspect that they are going to retain that status for a while.
Hotmail is doing much better this week:
- one message accepted.
- 4 messages rejected because they came from non-Hotmail email addresses (all from various non-US Hotmail domains; I really have to improve that check).
- no messages sent to spamtraps, refused because the sender had already hit spamtraps, or rejected because of their originating IP address.
I'm willing to tentatively declare that Hotmail has fixed their problem. Besides, as far as I can tell the problem free webmail provider is now Yahoo; I am getting significant advance fee fraud spam through Yahoo from a spam gang that they haven't stopped. (The situation is bad enough that I have started blocking non-US Yahoo operations as they spam us.)
The final numbers:
| what | # this week | (distinct IPs) | # last week | (distinct IPs) |
Bad HELOs |
448 | 49 | 405 | 46 |
| Bad bounces | 10 | 10 | 8 | 7 |
More than half (244 out of 448) of the bad HELOs came from
btconnect.com's pool of SMTP senders in 213.123.26.0/24, which HELO
with names like 'hesa05uker.he.local' (sometimes capitalized). The
pattern for usernames in the bad bounces is fairly similar to last
week, including another bounce to that 38-character hex sequence (but
from a different domain).