GPU Usenet spam reports

These are the recent reports from our Usenet despamming work on our newsreader machine. There are (or should be) two reports for each day; one on the volume of rejections and the other on the sources of rejected articles. Reports are plaintext ASCII.

Reports

Not everything we reject is spam under the standard definition. This can dramatically skew the reports of the sources of rejected articles (especially since we rejected excessively crossposted articles using a narrow criteria in the alt.sex and alt.binaries hierarchies).

Recent reports (last updated Fri Sep 9 09:20:01 EDT 2011):

Sep 9, 2011: rejection volume, rejection sources.
Sep 7, 2011: rejection volume, rejection sources.
Sep 6, 2011: rejection volume, rejection sources.
Sep 5, 2011: rejection volume, rejection sources.
Sep 4, 2011: rejection volume, rejection sources.
Sep 3, 2011: rejection volume, rejection sources.
Sep 2, 2011: rejection volume, rejection sources.
Sep 1, 2011: rejection volume, rejection sources.
Aug 31, 2011: rejection volume, rejection sources.
Aug 30, 2011: rejection volume, rejection sources.
Aug 29, 2011: rejection volume, rejection sources.
Aug 28, 2011: rejection volume, rejection sources.
Aug 27, 2011: rejection volume, rejection sources.
Aug 26, 2011: rejection volume, rejection sources.
Aug 25, 2011: rejection volume, rejection sources.
Aug 24, 2011: rejection volume, rejection sources.
Aug 23, 2011: rejection volume, rejection sources.
Aug 22, 2011: rejection volume, rejection sources.
Aug 21, 2011: rejection volume, rejection sources.
Aug 20, 2011: rejection volume, rejection sources.
Aug 19, 2011: rejection volume, rejection sources.
Aug 18, 2011: rejection volume, rejection sources.
Aug 17, 2011: rejection volume, rejection sources.
Aug 16, 2011: rejection volume, rejection sources.
Aug 15, 2011: rejection volume, rejection sources.
Aug 14, 2011: rejection volume, rejection sources.
Aug 13, 2011: rejection volume, rejection sources.
Aug 12, 2011: rejection volume, rejection sources.
Aug 11, 2011: rejection volume, rejection sources.
Aug 10, 2011: rejection volume, rejection sources.
Aug 9, 2011: rejection volume, rejection sources.
Aug 8, 2011: rejection volume, rejection sources.
Aug 7, 2011: rejection volume, rejection sources.
Aug 6, 2011: rejection volume, rejection sources.
Aug 5, 2011: rejection volume, rejection sources.

We keep reports around for some indeterminate number of days. Reports cover a 24 hour period up to roughly 7:20am the day that they're generated. System crashes and other outages may cause a report not to be generated on a particular day.

Reading the reports

Here is some information to help you decypher the more cryptic entries in the reports. In general a familiarity with Usenet spam and the common Usenet spammers will be a great help; you may find Ed Falk's glossary to be a useful reference.

The rejection sources report

This report descripts the apparent origins of non spam cancel articles that we rejected (not necessarily spam per se). The Path origin of an article is the second to last entry in its Path: header (or in a few cases the entry before that), which is normally the news machine that the article was posted to. Some spammers forge this or make up completely bogus news server names, especially Netzilla. The SBI is a measure of both the number of articles and the number of newsgroups they were crossposted to; the formula used to compute it is in the spam FAQ.

The (Rank) column shows how the entry ranks on the alternate ordering (which is the column immediately to the left). Crossposting widely is very popular with some spammers, which can give them a low rank by article count but an inordinate impact on people who read only a few of the groups they spam.

The rejection volume report

This breaks down our various reasons for rejecting articles and shows some overall statistics about article volume. Cyberspam articles are spam cancels, or at least articles who's message IDs followed the $alz convention of starting with cancel. (the reasons behind this are in the spam FAQ). Authentic articles are articles that are not spam cancels.

The reasons for article rejection are potentially cryptic; if you are deeply curious about one, send mail to ask. Because tests are applied in a cryptic order, spam is often rejected for an apparent non-spam reason (such as having too many crossposts; this often happens to Netzilla spam).

The second table (well, two tables joined together) is summed both across and down, with 100% of the volume being in the lower right corner. It thus can be used to get a three way breakdown of our traffic volume, between cyberspam cancels (which are useless), rejected non-cyberspam articles (which are also useless), and the useful authentic accepted articles.

Technical details

Our despamming software operates on news batches and does not bother keeping a full message id history to supress duplicates (it leaves that up to CNews, which is doing it for real and with much better code). While it drops duplicates within its scanning range (and almost never finds any), and our NNTP receivers suppress duplicates on their own, there is always the possibility that we are receiving a very slow feed of duplicates that makes our article volume look higher than its true value.

For gored oxes:

It is possible that you will find yourself listed as a source of spam. If this is the case, please do not bother mailing us to tell us that what you originate is not really spam; whatever you call it, we've decided that we don't want it. If on the other hand you were formerly an open news server and have closed yourself, please let us know and we'll remove you.

If you are offended at being high in a rejection sources listing, relax. It may because your users crosspost widely and not because they spam. On the other hand, have you really looked at what they post recently (or read your abuse and postmaster mailboxes)? Maybe you should.

Further information

These Usenet despamming reports are only part of our general information on our Usenet work, our stance against all sorts of network spam and abuse, and of our policy of making much of our software and configuration data (such as our Usenet despamming code) available to the Internet community.

This page and much of our precautions are maintained by Chris Siebenmann, who hates junk email and other spam.