Spam as a tax on public participation in open source projects

December 29, 2010

One of the things that has struck me lately is that spam has become an implicit tax on publicly participating in various open source projects. The mechanisms of this are fairly simple: if you have a sufficiently popular open source project, the spammers are sitting there mining the project mailing lists and their web spinoffs for email addresses and then spamming the heck out of them. There's probably also spammers mining web-based bug trackers too, where bug trackers expose this information.

(Actually, this is probably a simplification. Based on my personal experience, it seems more likely that there are people harvesting fresh 'hot' addresses from these mailing lists and then selling them to spammers, primarily advance fee fraud spammers.)

Open source projects are particularly susceptible to this because they still make heavy use of public mailing lists and so reveal your email address when you take part in them. 'Taking part' can be quite minor; just briefly dipping into the conversation at the wrong time can be enough to get your address harvested.

This is a tax because spam degrades the usefulness of an email address. The more spam an address gets, the more that will be missed by whatever automated anti-spam defenses the address has and thus the more you'll have to deal with personally; the end stage is the address becomes dead.

The obvious workaround is to use a revocable email address whenever you need to participate in public (and expect to revoke it periodically and switch to a new one). One of the problems with this is that it damages the long-term usefulness of old mailing list messages (and email addresses in VCS commit logs and so on), since many of the address in them will no longer be valid.

PS: possibly I am over-generalizing from my experience, but I don't really think so. Alternately, perhaps regular participants develop a thick skin for spam (or you have to have a thick skin for spam in order to be a regular participant).

Comments on this page:

From at 2010-12-29 03:06:41: has been my email address for 15 years and googling for it will show that I have been very active on many mailing lists. It is far from dead. Basic spam filtering keeps it very useable. I think you are indeed overstating the case.

By gsauthof at 2011-01-18 18:17:07:

Well, I have the opposite experiences.

I use a few e-mail addresses, which are published on some websites (without obfuscation) and are displayed in various mailing list archives, bug trackers etc.

For years, SPAM is not a problem. At address A I used to get ~ 100 Spam-Mails per day. But my bayes classificator worked very well, such that I could not care less. Perhaps 1 false positive per month, but this was just borderline-SPAM. The bayes classificator works so well that I automatically forward mails with really high scores to /dev/null. The Mail-admins switched at some point to multiple black lists - which is quite effective, too - my bayes classificator is not out of work, but it is a lot less than before.

At another address the Mail-admin uses greylisting, which is effective as well. I don't need to do any further SPAM filtering there - ok, you can only use greylisting for mail which is not time critical.

I administer an open mailing list for years where again the lists email address is available un-obfuscated at several web-pages. Before I was in charge, the list was hit by ~ 100 SPAM-mails a day. Setting up a list-specific bayes classificator that filters SPAM at the source instead of multiplexing it to all subscribers is not rocket science and fast to train with a train-on-errors strategy. Sure, interfaces for convenient SPAM management in common mailing-list software could be improved.

Written on 29 December 2010.
« A modest proposal for fixing your bug tracker
Why you need select() even with communication channels »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Dec 29 00:59:53 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.