Wandering Thoughts archives

2009-04-10

Why 'sender stores message' is easier for spammers than real mail servers

In my first entry on 'sender stores message' schemes for email, I asserted that spammers would have no problem making sure that their servers are not crushed under the load of everyone trying to read the spam email from them, yet in SenderStorageProblems I said that real mail senders would have such problems. The difference is that the two groups have very different problems, and the spammers can cheat. Let's look at what the two look like.

Real mail servers of any significant size will be dealing with tens of thousands of messages (or more) amounting to tens to hundreds of gigabytes in total size. Many of those messages will be read only a few times each, but they may be called for at random times, possibly quite a long time after they're sent. And the mail server must reliably store and serve these messages when asked, never confusing one for another.

(This does point out the tricky issue of when a 'sender stores message' mail server gets to delete the stored message, if ever.)

The mail servers of anyone sending bulk email have a very different usage pattern. They will likely be dealing with only a few hundred or a few thousand unique messages, quite possibly ones that can be synthesized on the fly from relatively compact information; the total unique storage needed is likely to be modest, and will probably fit in RAM on a decent server. These messages do not have to live very long, probably a few weeks at most, and it's not a tragedy if things are less than perfectly reliable.

Even if spammers serve static messages instead of creating them on the fly, the problem of serving a hundred or a thousand relatively small files to people is much different and rather easier than the problem of serving tens or hundreds of thousands of files, some of them quite big, to the same number of people. One hard bound is that reading random things from disk has an intrinsic speed limit; you get around 100 random IOs per second per disk that you have, and this is not increasing very fast (or cheaply).

SenderStorageHelpsSpammers written at 02:52:29; Add Comment

2009-04-07

The technical problems with 'sender stores messages' schemes

For some reason, people have an enduring like for new schemes for email where the sender stores the message until the recipient wants to read it (the most well known is D. J. Bernstein's Internet Mail 2000). Such schemes tend to handwave the social problems involved in a transition, but let's set that aside (along with why they won't stop spam) and talk about the practical technical problems, because from my perspective they are pretty bad.

If the sender stores the messages until they are retrieved by the recipient, both sides face a daunting series of problems:

  • in a straightforward implementation, the user experience is going to be terrible. Think of a version of IMAP where message retrieval has random delays (more so than currently) and sometimes fails entirely.

  • with SMTP, a sending machine can control its load when it is sending out a lot of email; it only sends so many messages at once and so on. With 'sender stores', the sending machine's load is controlled by all of the receivers; send out a popular message that everyone wants to read at once and, well, you have problems. And in general your load will be much less predictable and controllable.

  • the sender has a much harder time load-balancing their mail across multiple machines; either you need sophisticated reverse proxying load balancers, or you have to fix the server that specific recipients will pull the message from at the time that you send the message. Even this is not perfect load balancing, as a single receiver may forward on their 'copy' of the message, bound to a specific server, to more people than you expect or want.

  • as a receiver, you lose spam filtering information; you get only very minimal information until someone actually retrieves the message. I think that this makes malicious content quite dangerous, because the sending machine can generate the content on the fly at the last moment for up to the moment customized malware.

    (And you really, really don't want the access protocol to carry any information about what client the receiver is using.)

  • the sender gains significant information about the habits of the receivers. At a minimum the sender learns when they read email; at the worst, the sender can determine their network location (and in the process 'see through' forwarding). Applications to phish spam are left as an exercise, especially in an environment where the sender can generate the message on the fly.

  • in general, both receivers and senders expose a much greater attack surface to each other than they do today. The receiver is now talking directly to the sender in more or less real time (tunneling through your firewall in the process), and the sender is now running the rough equivalent of an IMAP or POP3 server that is necessarily exposed to the entire Internet.

Really, in a sender storage world the most sensible thing to do as a receiver is to have your mail server immediately retrieve a copy of everything that you're sent unless it fails basic checks. Which is basically equivalent to checking the SMTP envelope information that mail servers have available today. (How equivalent it is depends on how much additional information, if any, is sent in the initial notification message.)

Another way to put this is that if you want to do anything with a message beyond discarding it based on very minimal information, you must retrieve the message. Thus, at the best all that 'sender stores message' can do is defer the message's transfer, and in the process it makes a bunch of things worse and more complicated.

SenderStorageProblems written at 00:34:43; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.