2023-12-20
The (historical) background of 'SMTP Smuggling'
The recent email news is SEC Consult's SMTP Smuggling - Spoofing E-Mails Worldwide (via), which I had a reaction to. I found the article's explanation of SMTP Smuggling a little hard to follow, so for reasons that don't fit within the scope of today's entry, I'm going to re-explain the central issue in my own way.
SMTP is a very old Internet protocol, and like a variety of old Internet protocols it has what is now an odd and unusual core model. Without extensions, everything in SMTP is line based, with the sender and receiver exchanging a series of 7-bit ASCII lines for commands, command responses, and the actual email messages (which are sent as a block of text in the 'DATA' phase, ie after the sender has sent a 'DATA' SMTP command and the receiver has accepted it). Since SMTP is line based, email messages are also considered to be a series of lines, although the contents of those lines is (mostly) not interpreted. SMTP needs to signal the end of the email text being transmitted, and as a line based protocol it does this by a special marker line; a '.' on a line by itself marks the end of the message.
(In theory there's a defined quoting and de-quoting process if an actual line of the message starts with a '.'; see RFC 821 section 4.5.2, which is still there basically intact in RFC 5321 section 4.5.2. In practice, actual mailer behavior has historically varied.)
When you have a line based protocol you must decide how the end of lines are marked (the line terminator). In SMTP, the official line terminator is the two byte (two octet) sequence 'CR LF', because this was the fashion at the time. This includes the lines that are part of the email message that is sent in the DATA phase, and so the last five octets sent at the end of a standard compliant SMTP message are 'CR LF . CR LF'. The first 'CR LF' is the end of the last line of the actual message, and then '. CR LF' makes up the '.' on a line by itself.
(This means that all lines of the message itself are supposed to be terminated with 'CR LF', regardless of whatever the native line terminator is for the systems involved. If you're doing SMTP properly, you can't just blast out or read in the raw bytes of the message, even apart from RFC 5321 section 4.5.2 concerns. There are various ESMTP extensions that can change this.)
Unfortunately, SMTP's definition makes life quite inconvenient for systems that don't use CR LF as their native line ending, such as Unix (which uses just LF, \n). Because SMTP considers the email message itself to be a sequence of lines (and there's a line length limit), a Unix SMTP mailer has to keep translating all of the lines in every email message it sends or receives back and forth between lines ending in \n (the native format) and \r\n (the SMTP wire format). Doing this translation raises various questions about what you should send if you encounter a \r (or a \r\n) in a message as you send it, or encounter a bare \n (or \r) in a message as you receive it. It also invites shortcuts, such as turning \r\n into \n as you read data and then dealing with everything as Unix lines.
Partly for this reason and partly because CR LF line endings make various people grumpy, there has been somewhat of a tradition of mailers accepting other things as line endings in SMTP, not just CR LF. Historically a variety of Unix mailers accepted just LF, and I believe that some mailers have accepted just CR. Even today, finding SMTP listeners that absolutely require 'CR LF' as the line ending on SMTP commands isn't entirely common (GMail's SMTP listener doesn't, for example, although possibly this will cause it to be unhappy with your email, and I haven't tested its behavior for message bodies). As a result, such mailers can accept things other than 'CR LF . CR LF' as the SMTP DATA phase message terminator. Exactly what a mailer accepts can vary depending on how it implemented things.
(For instance, a mailer might turn '\r\n' into '\n' and accept '\n' as a line terminator, but only after checking for a line that was an explicit '. CR LF'. Then you could end messages with 'LF . CR LF', without the initial 'CR'; the bare LF would be taken as the line terminator for the last data line, then you have the '. CR LF' of the official terminator sequence. But if you sent 'LF . LF', that wouldn't be recognized as the message terminator.)
This leads to the core of SMTP Smuggling, which is embedding an improper SMTP message termination in an email message (for example, 'LF . LF'), then after it adding SMTP commands and message data to submit another message (the smuggled message). To make this do anything useful we need to find a SMTP server that will accept our message with the embedded improper terminator, then send the whole thing to another mail server that will treat the improper terminator as a real terminator, splitting what was one message into two, sent one after the other. The second mail server will see the additional mail message as coming from the first mail server, although it really came from us, and this may allow us to forge message data that we couldn't otherwise.
(There are various requirements to make this work; for example, the second mail server has to accept being handed a whole block of SMTP commands all at once. These days this is a fairly common thing due to an ESMTP extension for 'pipelining', and also because SMTP receivers have to do extra work to detect and reject getting handed a block of stuff like this. See the original article for the gory details and an extended discussion.)
What you can do with SMTP Smuggling in practice has some limitations and qualifications, but that's for another entry.