The technical problems with 'sender stores messages' schemes

April 7, 2009

For some reason, people have an enduring like for new schemes for email where the sender stores the message until the recipient wants to read it (the most well known is D. J. Bernstein's Internet Mail 2000). Such schemes tend to handwave the social problems involved in a transition, but let's set that aside (along with why they won't stop spam) and talk about the practical technical problems, because from my perspective they are pretty bad.

If the sender stores the messages until they are retrieved by the recipient, both sides face a daunting series of problems:

  • in a straightforward implementation, the user experience is going to be terrible. Think of a version of IMAP where message retrieval has random delays (more so than currently) and sometimes fails entirely.

  • with SMTP, a sending machine can control its load when it is sending out a lot of email; it only sends so many messages at once and so on. With 'sender stores', the sending machine's load is controlled by all of the receivers; send out a popular message that everyone wants to read at once and, well, you have problems. And in general your load will be much less predictable and controllable.

  • the sender has a much harder time load-balancing their mail across multiple machines; either you need sophisticated reverse proxying load balancers, or you have to fix the server that specific recipients will pull the message from at the time that you send the message. Even this is not perfect load balancing, as a single receiver may forward on their 'copy' of the message, bound to a specific server, to more people than you expect or want.

  • as a receiver, you lose spam filtering information; you get only very minimal information until someone actually retrieves the message. I think that this makes malicious content quite dangerous, because the sending machine can generate the content on the fly at the last moment for up to the moment customized malware.

    (And you really, really don't want the access protocol to carry any information about what client the receiver is using.)

  • the sender gains significant information about the habits of the receivers. At a minimum the sender learns when they read email; at the worst, the sender can determine their network location (and in the process 'see through' forwarding). Applications to phish spam are left as an exercise, especially in an environment where the sender can generate the message on the fly.

  • in general, both receivers and senders expose a much greater attack surface to each other than they do today. The receiver is now talking directly to the sender in more or less real time (tunneling through your firewall in the process), and the sender is now running the rough equivalent of an IMAP or POP3 server that is necessarily exposed to the entire Internet.

Really, in a sender storage world the most sensible thing to do as a receiver is to have your mail server immediately retrieve a copy of everything that you're sent unless it fails basic checks. Which is basically equivalent to checking the SMTP envelope information that mail servers have available today. (How equivalent it is depends on how much additional information, if any, is sent in the initial notification message.)

Another way to put this is that if you want to do anything with a message beyond discarding it based on very minimal information, you must retrieve the message. Thus, at the best all that 'sender stores message' can do is defer the message's transfer, and in the process it makes a bunch of things worse and more complicated.

Written on 07 April 2009.
« An interesting hardware mystery
Handling ssh to generic hostnames »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Apr 7 00:34:43 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.