#pragma search blog/tech == Why I don't like SMTP command parameters Modern versions of SMTP have added something called 'command parameters'. These extend the _MAIL FROM_ and _RCPT TO_ commands to add optional parameters to communicate, for example, the rough size of a message that is about to be sent (that's [[RFC 1870 http://tools.ietf.org/html/rfc1870]]). On the surface these appear perfectly sensible and innocent: .pn prewrap on > MAIL FROM: SIZE=99999 That is, the parameters are tacked on as 'NAME=VALUE' pairs after the address in the _MAIL FROM_ or _RCPT TO_. Unfortunately this innocent picture starts falling apart once you look at it closely because [[RFC 5321 http://tools.ietf.org/html/rfc5321]] addresses are crawling horrors of complexity. From the example I gave you might think that parsing your _MAIL FROM_ line is simple; just look for the first space and everything after it is parameters. Except that the local name of addresses can be quoted, and when quoted it can contain spaces: > MAIL FROM:<"some person"@a.dom> SIZE=99999 Fine, you say, we'll look for '_> _'. Guess what quoted parts can also contain? > MAIL FROM:<"some> person"@a.dom> SIZE=99999 Okay, you say, we'll look for the rightmost '_> _' in the message. Surely that will do the trick? > MAIL FROM: SIZE=99999> BODY=8BITMIME This is a _MAIL FROM_ line with a perfectly valid address and then a (maliciously) mangled _SIZE_ parameter. You're probably going to reject this client command, but are you going to reject it for the right reason? What the authors of [[RFC 5321]] have created is a situation where you must do at least a basic parsing of the internal structure of the address just to find out where it ends. Especially in the face of potentially mangled input there is no simple way of determining the end of the address and the start of parameters, despite appearances. Yet the situation looks deceptively simple and a naive parser will work almost all of the time (even quoted local parts are rare, much less ones with wacky characters in them, and my final example is extremely perverse). I'm sure this was not exactly deliberate on the part of the RFC authors, because after all they're dealing with decades of complex history involving all sorts of baroque possible addressing. From its beginning SMTP was complicated by backwards compatibility requirements and could not, eg, dictate that local mailboxes had to fit into certain restrictions. I'm sure that current RFC authors would like to have thrown all of this away and gone for simple addresses with no quoted local parts and so on. They just couldn't get away with it. There is a moral in here somewhere but right now I'm too grumpy to come up with one. (For more background on the various SMTP extensions, see eg [[the Wikipedia entry http://en.wikipedia.org/wiki/Extended_SMTP#Extensions]].) PS: note that a semi-naive algorithm may also misinterpret '_MAIL FROM SIZE=999>_'. After all, it has a '_>_' right there as the last character.