Spammers are quite dedicated in their address scraping
This is one of those entries that require some apparently irrelevant background.
The Atom syndication feed format requires that each entry have a unique
identifier assigned to it (the atom:id
element, to use XML jargon).
This identifier is a valid URI, formed using any number of schemes
(see here).
DWiki (the software behind WanderingThoughts) initially used the full
URL of entries as the Atom ID, because this required no additional
configuration or per-entry metadata. However, this causes serious
problems if you ever want to move your blog, so not too long ago I
switched new entries to using tag:
URIs.
While you can read all the gory details here,
the simple version of tag:
URIs is that they look like this (without
spaces and quotes):
"
tag:
" authorityName ",
" date ":
" path
The authorityName is normally a domain; however, the spec says that you can use
'<id>@<domain>' as well. For reasons beyond the scope of this entry,
I decided to use the second format for the tag:
URIs here, with the
authorityName being cspace@<domain>
.
(In brief: the advantage of this format is that you don't have to invent a new subdomain for everything you host; you use one domain and have a unique identifier as the <id> bit.)
You can see where this is going. A bit over a month after I started using this format for Atom IDs, I started getting email attempts to 'cspace@<domain>' (which were rejected; there is no requirement that such authorityNames actually are email addresses, and the domain I used doesn't even accept email to start with).
After talking about this with some people, the general speculation is
not that spammers are scraping Atom feeds for tag:
URIs with email
address (which would show true dedication and craziness), but that they
are mining syndication feeds for anything that looks even vaguely like
an email address. This sort of makes sense, especially if you assume
that they're using brute force regexp-based scanners instead of making
any attempt to understand syndication formats. But it makes a good
illustration of how spammers will scrape anything in sight that might
somewhere have an email address.
|
|