Why I don't like SMTP command parameters
Modern versions of SMTP have added something called 'command
parameters'. These extend the
MAIL FROM and
RCPT TO commands
to add optional parameters to communicate, for example, the rough
size of a message that is about to be sent (that's RFC 1870). On the surface these appear
perfectly sensible and innocent:
MAIL FROM:<email@example.com> SIZE=99999
That is, the parameters are tacked on as 'NAME=VALUE' pairs after
the address in the
MAIL FROM or
RCPT TO. Unfortunately this
innocent picture starts falling apart once you look at it closely
because RFC 5321 addresses
are crawling horrors of complexity.
From the example I gave you might think that parsing your
line is simple; just look for the first space and everything after it
is parameters. Except that the local name of addresses can be quoted,
and when quoted it can contain spaces:
MAIL FROM:<"some person"@a.dom> SIZE=99999
Fine, you say, we'll look for '
> '. Guess what quoted parts can
MAIL FROM:<"some> person"@a.dom> SIZE=99999
Okay, you say, we'll look for the rightmost '
> ' in the message.
Surely that will do the trick?
MAIL FROM:<firstname.lastname@example.org> SIZE=99999> BODY=8BITMIME
This is a
MAIL FROM line with a perfectly valid address and then
a (maliciously) mangled
SIZE parameter. You're probably going to
reject this client command, but are you going to reject it for the
What the authors of RFC 5321 have created is a situation where you must do at least a basic parsing of the internal structure of the address just to find out where it ends. Especially in the face of potentially mangled input there is no simple way of determining the end of the address and the start of parameters, despite appearances. Yet the situation looks deceptively simple and a naive parser will work almost all of the time (even quoted local parts are rare, much less ones with wacky characters in them, and my final example is extremely perverse).
I'm sure this was not exactly deliberate on the part of the RFC authors, because after all they're dealing with decades of complex history involving all sorts of baroque possible addressing. From its beginning SMTP was complicated by backwards compatibility requirements and could not, eg, dictate that local mailboxes had to fit into certain restrictions. I'm sure that current RFC authors would like to have thrown all of this away and gone for simple addresses with no quoted local parts and so on. They just couldn't get away with it.
There is a moral in here somewhere but right now I'm too grumpy to come up with one.
(For more background on the various SMTP extensions, see eg the Wikipedia entry.)
PS: note that a semi-naive algorithm may also misinterpret '
>' right there as the