2023-12-25
Standards often provide little guidance for handling 'bad' content
In a comment on my entry on what I think SMTP Smuggling enables, Leah Neukirchen noted something important, which is that SMTP messages that contain a CR or a LF by itself aren't legal:
I disagree. The first mail server is also accepting a message with a non-CRLF LF, which violates RFC 5322 section 2.3
CR and LF MUST only occur together as CRLF; they MUST NOT appear independently in the body.
The capitalization in the RFC quote is original, the emphasis is mine, and the meaning of these terms is covered in RFC 2119. What it adds up to is unambiguous at one level; a SMTP message that contains a bare CR or LF isn't an RFC 2119 compliant message, much like a C program with undefined behavior isn't a valid ANSI C program.
But just like the ANSI C standard doesn't (as far as I know) put any requirements on how a C compiler handles a non-ANSI-C program, RFC 2119 provides no requirements or guidance on what you should or must do with a non-compliant message. This is quite common in standards; standards often spell out only what is within their scope and what must be done with those things. They've historically been silent about non-standard things, leaving it entirely to the implementer. When it comes to protocol elements, this generally means rejecting them (you don't try to guess what unknown SMTP commands are), but when it comes to things you don't act on like email message content, things are much fuzzier.
At this point two things often intervene, The first is Postel's Law, which suggests people accept things outside the standard. The second is that strong standards compliance is often actively inconvenient or problematic for people using the software. I've lived life behind a SMTP mailer that had strong feelings about RFC compliance (at least in some areas), and by and large we didn't like it. Strict software is often unpopular software, which pushes people writing software to appeal to Postel's Law in the absence of anything else. If you don't even have an RFC to point to that says 'you SHOULD reject this' (or 'you MUST reject this') and you have people banging on your door wanting you to be liberal, often the squeaky wheel gets the grease (or has gotten until recently; these days people are somewhat less enamored of Postel's Law, for various reasons including security issues).
(C compilers and their reaction to undefined behavior is a complex subject, but I don't know of any mainstream compiler that will actually reject code that has known undefined behavior.)
At this point there's not much we can do here. It's obviously much too late for existing RFCs and standards that don't have any requirements or guidance on what you should do about bad contents, and I'm not sure that people would agree on adding it anyway. People can attempt to be strict and hope that not much will be affected, or they can try to write rules about error recovery (which HTML eventually did in HTML5) to encourage software to all do the same, agreed-on thing. But these will probably mostly be reactive things, not proactive ones (so we're probably about to see a wave of SMTP mailers getting strict in the wake of SMTP Smuggling).