These are the recent reports from our Usenet despamming work on our newsreader machine. There are (or should be) two reports for each day; one on the volume of rejections and the other on the sources of rejected articles. Reports are plaintext ASCII.
Not everything we reject is spam under the standard definition. This can dramatically skew the reports of the sources of rejected articles (especially since we rejected excessively crossposted articles using a narrow criteria in the alt.sex and alt.binaries hierarchies).
Recent reports (last updated Fri Sep 9 09:20:01 EDT 2011):
We keep reports around for some indeterminate number of days. Reports cover a 24 hour period up to roughly 7:20am the day that they're generated. System crashes and other outages may cause a report not to be generated on a particular day.
Here is some information to help you decypher the more cryptic entries in the reports. In general a familiarity with Usenet spam and the common Usenet spammers will be a great help; you may find Ed Falk's glossary to be a useful reference.
This report descripts the apparent origins of non spam cancel articles that we rejected (not necessarily spam per se). The Path origin of an article is the second to last entry in its Path: header (or in a few cases the entry before that), which is normally the news machine that the article was posted to. Some spammers forge this or make up completely bogus news server names, especially Netzilla. The SBI is a measure of both the number of articles and the number of newsgroups they were crossposted to; the formula used to compute it is in the spam FAQ.
The (Rank) column shows how the entry ranks on the alternate ordering (which is the column immediately to the left). Crossposting widely is very popular with some spammers, which can give them a low rank by article count but an inordinate impact on people who read only a few of the groups they spam.
This breaks down our various reasons for rejecting articles and shows some overall statistics about article volume. Cyberspam articles are spam cancels, or at least articles who's message IDs followed the $alz convention of starting with cancel. (the reasons behind this are in the spam FAQ). Authentic articles are articles that are not spam cancels.
The reasons for article rejection are potentially cryptic; if you are deeply curious about one, send mail to ask. Because tests are applied in a cryptic order, spam is often rejected for an apparent non-spam reason (such as having too many crossposts; this often happens to Netzilla spam).
The second table (well, two tables joined together) is summed both across and down, with 100% of the volume being in the lower right corner. It thus can be used to get a three way breakdown of our traffic volume, between cyberspam cancels (which are useless), rejected non-cyberspam articles (which are also useless), and the useful authentic accepted articles.
Our despamming software operates on news batches and does not bother keeping a full message id history to supress duplicates (it leaves that up to CNews, which is doing it for real and with much better code). While it drops duplicates within its scanning range (and almost never finds any), and our NNTP receivers suppress duplicates on their own, there is always the possibility that we are receiving a very slow feed of duplicates that makes our article volume look higher than its true value.
It is possible that you will find yourself listed as a source of spam. If this is the case, please do not bother mailing us to tell us that what you originate is not really spam; whatever you call it, we've decided that we don't want it. If on the other hand you were formerly an open news server and have closed yourself, please let us know and we'll remove you.
If you are offended at being high in a rejection sources listing, relax. It may because your users crosspost widely and not because they spam. On the other hand, have you really looked at what they post recently (or read your abuse and postmaster mailboxes)? Maybe you should.
These Usenet despamming reports are only part of our general information on our Usenet work, our stance against all sorts of network spam and abuse, and of our policy of making much of our software and configuration data (such as our Usenet despamming code) available to the Internet community.