Why I don't like SMTP command parameters

June 4, 2014

Modern versions of SMTP have added something called 'command parameters'. These extend the MAIL FROM and RCPT TO commands to add optional parameters to communicate, for example, the rough size of a message that is about to be sent (that's RFC 1870). On the surface these appear perfectly sensible and innocent:

MAIL FROM:<some@address.dom> SIZE=99999

That is, the parameters are tacked on as 'NAME=VALUE' pairs after the address in the MAIL FROM or RCPT TO. Unfortunately this innocent picture starts falling apart once you look at it closely because RFC 5321 addresses are crawling horrors of complexity.

From the example I gave you might think that parsing your MAIL FROM line is simple; just look for the first space and everything after it is parameters. Except that the local name of addresses can be quoted, and when quoted it can contain spaces:

MAIL FROM:<"some person"@a.dom> SIZE=99999

Fine, you say, we'll look for '> '. Guess what quoted parts can also contain?

MAIL FROM:<"some> person"@a.dom> SIZE=99999

Okay, you say, we'll look for the rightmost '> ' in the message. Surely that will do the trick?

MAIL FROM:<person@a.dom> SIZE=99999> BODY=8BITMIME

This is a MAIL FROM line with a perfectly valid address and then a (maliciously) mangled SIZE parameter. You're probably going to reject this client command, but are you going to reject it for the right reason?

What the authors of RFC 5321 have created is a situation where you must do at least a basic parsing of the internal structure of the address just to find out where it ends. Especially in the face of potentially mangled input there is no simple way of determining the end of the address and the start of parameters, despite appearances. Yet the situation looks deceptively simple and a naive parser will work almost all of the time (even quoted local parts are rare, much less ones with wacky characters in them, and my final example is extremely perverse).

I'm sure this was not exactly deliberate on the part of the RFC authors, because after all they're dealing with decades of complex history involving all sorts of baroque possible addressing. From its beginning SMTP was complicated by backwards compatibility requirements and could not, eg, dictate that local mailboxes had to fit into certain restrictions. I'm sure that current RFC authors would like to have thrown all of this away and gone for simple addresses with no quoted local parts and so on. They just couldn't get away with it.

There is a moral in here somewhere but right now I'm too grumpy to come up with one.

(For more background on the various SMTP extensions, see eg the Wikipedia entry.)

PS: note that a semi-naive algorithm may also misinterpret 'MAIL FROM<a@b> SIZE=999>'. After all, it has a '>' right there as the last character.

Written on 04 June 2014.
« My just-used Go logging idiom and why it is in fact wrong
SMTP's crazy address formats didn't come from nowhere »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jun 4 02:21:12 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.