Some problems in common definitions of 'spam email'

October 1, 2005

The most common attempts to define 'spam email' is either as 'UBE' (Unsolicited Broadcast Email) or 'UCE' (Unsolicited Commercial Email); for example, the spamhaus.org definition here. I tend to think that this sort of definition of spam has some problems.

Let's start with a provocative question: is advance fee fraud (so-called '419' email) spam email? (You know this type of spam; the classic version has Mrs. Mariam Abacha, wife of the late Nigerian dictator Sani Abacha, asking you to help get her husband's fortune to safety.)

A peculiarity of advance fee fraud email is that the messages are often composed by hand (sometimes by people sitting in an Internet cafe in Nigeria) and sent to relatively few people. So it isn't necessarily UBE (or at least not straightforwardly).

One can say that this is UCE because it is 'commercial' in the sense of 'having profit as a chief aim' (cf this definition of 'commercial'), but I think that this is stretching the term. The sender hopes to profit not through a business transaction with you, but by defrauding you out of some money.

But let's go one step further. Take a message that was dumped into my mailbox in August 2005, that started with:

The forgotten facts in all religions are explained by Allah through Imam Iskender Ali MIHR.

This is clearly not UCE; there's no attempt to profit, just proselytize (for www.mihr.com). This particular example was probably UBE (but I don't know for sure), but sooner or later similar messages may be composed by earnest people in Internet cafes and sent out just to you. Does that make them not spam? I'm pretty sure most people would disagree and call such email spam.

Clearly people's practical, gut definition of email spam is wider than just UCE or UBE.

Spamhaus has a technical definition of spam that would include the 'Iskender' email above, because it had no personalization for each recipient. But what if the earnest young men start personalizing their proselytization, perhaps using information from your web page; is their email transmuted to 'not spam' just because they are doing research and typing things by hand?

Was it spam when a fire and forget Microsoft recruiter sent Eric S. Raymond (a well known open-source booster and no fan of Microsoft) a recruitment pitch? (It was probably sent by hand.)

This matters because there are a number of ISPs and other organizations that find it convenient to define spam as only UCE (or UBE, depending on the organization). If their customers are doing things that fall outside of UCE or UBE, you are generally out of luck. (And I'm sure that Microsoft would assure us that the email to ESR is definitely not spam.)

Perhaps this is why brinkster.com has yet to do anything about www.mihr.com (IP address 65.182.104.58), despite the August 2005 spam being sent from mihrfoundation.com (IP address 65.182.104.57 at the time, right next door). After all, it wasn't UCE.


Comments on this page:

From 67.190.163.211 at 2005-10-01 23:05:04:

Your provocative questions about whether typing in a message by hand or doing some personalization are akin to 'how many angels can dance on the head of a pin'. As such, they miss the mark of utility.

"Personalization" of a bulk message doesn't make it not bulk. Similarly, "research" to "personalize" it before sending it doesn't make it not bulk. In none of the messages you used as examples is the recipient's identity really part of the reason for sending (even the MS recruiter was broadcasting, not trying to converse with an individual). They thus are spam according to spamhaus.org's definition. Furthermore, the messages are all substantively identical, so they're bulk according to most other definitions.

In the end, it's immaterial if someone types each one of their bulk messages by hand, or uses a less expensive piece of software to do the job. The only difference is the rapidity of the damage they cause (and that's a matter of degree, not kind, which can furthermore be trumped by throwing more headcount at the keyboards). It's still spam if it's unsolicited and substantively identical regardless of "research" or "personalization".

By cks at 2005-10-02 00:45:20:

I may not have been clear enough on my position, so let me try to boil it down more:

I'm against using just the technical definitions of 'spam email', and somewhat against trying to shoehorn things into them, because I don't think it matches how people think about the problem and because it invites ISPs and other parties to try to duck the issue by claiming 'this isn't spam because ...'. I'd rather see people agree that UBE is just the clear spam, and that there will be email that doesn't go very easily into the UBE definition but is still spam email. (Much like how Usenet has 'spam' and 'cancellable spam'.)

I'm pretty sure it will never be possible to create a mechanical definition of spam email. (Note that Usenet spam, which has a far more cut and dry definition, has serious problems at the margins.)

(I suspect we both agree that the UCE definition is not workable at all.)

Written on 01 October 2005.
« The many consoles of Linux
Weekly spam summary on October 1st, 2005 »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Sat Oct 1 18:18:12 2005
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.