Chris's Wiki :: blog/programming/CRLFAndBuffering Commentshttps://utcc.utoronto.ca/~cks/space/blog/programming/CRLFAndBuffering?atomcommentsDWiki2017-02-14T16:13:35ZRecent comments in Chris's Wiki :: blog/programming/CRLFAndBuffering.By Aristotle Pagaltzis on /blog/programming/CRLFAndBufferingtag:CSpace:blog/programming/CRLFAndBuffering:71c5edd7ef1b27f0a24a989df486d8ecdb55bfb9Aristotle Pagaltzishttp://plasmasturm.org/<div class="wikitext"><p>I don’t think it can be handled the same way as UTF-8 decoding, because a multi-byte sequence, even an invalid one, is ultimately atomic. A CR is its own character regardless of whether it’s followed by LF (which is also its own character regardless of whether it was preceded by a CR).</p>
<p>It is certainly not a unique(ly difficult) problem, but the parallel I thought of is chunk-wise tokenizing, e.g. of XML. (I was going to say structured parsing, but it’s really only the tokenizing part – of course parsing XML is barely parsing, so that analogy fits.) And suddenly it’s making sense to me why the task is so surprisingly painful – it’s the most minimal form of a higher class of problem than what you’d like to be dealing with.</p>
<p>This all wasn’t clear to me until you prompted me to think about it, so thanks for that.</p>
<p>So just by unmindfully defining EOL as two characters, you accidentally teleport into a different problem space. Oops. Cut EOL back to a single character, and you drop down into a lower class of problem. So your instinctive aversion to CR+LF in new protocols (where this conversation started) is eminently warranted.</p>
</div>2017-02-14T16:13:35Z