Some weird and dubious syndication feed fetching from SBL-listed IPs
For reasons beyond the scope of this entry (partly 'because I could'), I've recently been checking to see if any of the IPs that visit Wandering Thoughts are on the Spamhaus SBL. As a preemptive note, using the SBL to block web access is not necessarily a good idea, as I've found out in the past; it's specifically focused on email, not any other sorts of abuse. However, perhaps you don't want to accept web traffic from networks that Spamhaus has identified as belonging to spammers, and Spamhaus also has the Don't Route Or Peer list (which is included in the SBL), of outright extremely bad networks.
When I started looking, I wasn't particularly surprised to find a fair number of IPs on Spamhaus CSS; in practice, the CSS seems to include a fair number of compromised IPs and doesn't necessarily expire them rapidly. However, I also found a surprising number of IPs listed in other Spamhaus records, almost always for network blocks; from today (so far), I had IPs from SBL443160 (a /22), SBL287739 (a /20 for a ROKSO-listed spammer), and especially SBL201196, which is a /19 on an extended version of Spamhaus's DROP list. These are all pretty much dedicated spam operations, not things that have been compromised or neglected, and as such I feel that they're worth blacklisting entirely.
Then I looked at what the particular IPs from these SBL listings were doing here on Wandering Thoughts, and something really peculiar started emerging. Almost all of the IPs were just fetching my syndication feed, using something that claims to be "rss2email/3.9 (https://github.com/wking/rss2email)" in its User-Agent. Most of them are making a single fetch request a day (often only one in several days), and on top of that I noticed that they often got a HTTP 304 'Not Modified' reply. Further investigation has shown that this is a real and proper 'Not Modified', based on these requests having an If-None-Match header with the syndication feed's current ETag value (since this is a cryptographic hash, they definitely fetched the feed before). Given that these IPs are each only requesting my feed once every several days (at most), their having the correct ETag value means that the people behind this are fetching my feed from multiple IPs across multiple networks and merging the results.
(I haven't looked deeply at the activity of the much more numerous SBL CSS listed IPs, but in spot checks some IPs appear to be entirely legitimate real browsers from real people, people who just have the misfortune to have or have inherited a CSS-listed IP.)
Before I started looking, I would have expected the activity from these bad network blocks to be comment spam attempts (which is part of what has attracted my attention to SBL-listed networks in the past). Instead I can't see any real traces of that; in fact, in the past ten days only one SBL listed IP has come close to trying to leave a comment here, and that was a CSS listing. Instead they seem to be harvesting my syndication feed, for an unknown purpose, and this harvesting appears to be done by some group that is active across multiple and otherwise unrelated bad network blocks.
(Since SBL listings are about email spammers, the obvious speculation here is that these people are scanning syndication feeds to find email addresses for spam purposes. This is definitely a thing in general, so it's possible.)
As a side note, this rss2email User-Agent is actually pretty common here (and right now it's the latest release of the actual project). Only a small fraction of the IPs using it are on the SBL; most of them are real, legitimate feed fetchers. Although I do have a surprisingly large number of IPs using rss2email that only fetched my syndication feed once today and still got a 304 Not Modified (which, in some cases, definitely means that they fetched it earlier from some other IP). Some of those one time fetchers turn out to have been doing this sporadically for some time. It's possible that these SBL-hosted fetchers are actually using rss2email, and now that I think about it I can see a reason why. If you already have an infrastructure for harvesting email addresses from email messages and want to extend it to syndication feeds, turning syndication feeds into email is one obvious and simple approach.
(I think the real moral here is to not turn over rocks because, as usual, disturbing things can be found there.)