== Spammers are quite dedicated in their address scraping This is one of those entries that require some apparently irrelevant background. The Atom syndication feed format requires that each entry have a unique identifier assigned to it (the _atom:id_ element, to use XML jargon). This identifier is a valid URI, formed using any number of schemes (see [[here http://diveintomark.org/archives/2004/05/28/howto-atom-id]]). DWiki (the software behind WanderingThoughts) initially used the full URL of entries as the Atom ID, because this required no additional configuration or per-entry metadata. However, this causes serious problems if you ever want to move your blog, so not too long ago I switched new entries to using _tag:_ URIs. While you can read all the gory details [[here http://taguri.org/]], the simple version of _tag:_ URIs is that they look like this (without spaces and quotes): > "_tag:_" ~~authorityName~~ "_,_" ~~date~~ "_:_" ~~path~~ The authorityName is normally a domain; however, [[the spec http://www.faqs.org/rfcs/rfc4151.html]] says that you can use '@' as well. For reasons beyond the scope of this entry, I decided to use the second format for the _tag:_ URIs here, with the authorityName being _cspace@_. (In brief: the advantage of this format is that you don't have to invent a new subdomain for everything you host; you use one domain and have a unique identifier as the bit.) You can see where this is going. A bit over a month after I started using this format for Atom IDs, I started getting email attempts to 'cspace@' (which were rejected; there is no requirement that such authorityNames actually are email addresses, and the domain I used doesn't even accept email to start with). After talking about this with some people, the general speculation is not that spammers are scraping Atom feeds for _tag:_ URIs with email address (which would show true dedication and craziness), but that they are mining syndication feeds for anything that looks even vaguely like an email address. This sort of makes sense, especially if you assume that they're using brute force regexp-based scanners instead of making any attempt to understand syndication formats. But it makes a good illustration of how spammers will scrape anything in sight that might somewhere have an email address.