The HTML <pre> element doesn't do very much

June 5, 2019

These days I don't do too much with HTML, so every so often I wind up in a situation where I have to reach back and reconstruct things that once were entirely well known to me. Today, I wound up talking with someone about the <pre> element and what you could and couldn't safely put in it, and it took some time to remember most of the details.

The simple version is that <pre> doesn't escape markup, it only changes formatting, although many simple examples you'll see only use it on plain text so it's not immediately clear. Although it would be nice if <pre> was a general container that you could pour almost arbitrary text into and have it escaped, it's not. If you're writing HTML by hand and you have something to put into a <pre>, you need to escape any markup and HTML entities (much like a <textarea>, although even more so). Alternately, you can actually use this to write <pre> blocks that contain markup, for example links or text emphasis (you might deliberately use bold inside a <pre> to denote generic placeholders that the reader fills in with their specifics).

As with <textarea>, it's easy to overlook this for straightforward cases and to get away without doing any text escaping, especially in modern browsers. A lot of the command lines or code or whatever that we often put into <pre> don't contain things that can be mistaken for HTML markup or HTML entities, and modern browsers will often silently re-interpret things as plain text for you if they aren't validly formatted entities or markup. I myself have written and altered any number of <pre> blocks over the past few years without ever thinking about it, and I'm sure that some of them included '<' or '>' and perhaps '&' (all as part of Unix command lines).

(The MDN page on <pre> includes an example with unescaped < and >. If you play around with similar cases, you'll probably find that what is rendered intact and what is considered to be an unrecognized HTML element that is silently swallowed is quite sensitive to details of formatting and what is included within the '< ... >' run of raw text. Browsers clearly have a lot of heuristics here, some of which have been captured in HTML5's description of tag open state. In HTML5, anything other than an ASCII alpha after the '<' makes it a non-element (in any context, not just in a <pre>).)

I don't know how browser interpretation of various oddities in <pre> content is affected by the declared or assumed HTML DOCTYPE or HTML version the browser assumes, but I wouldn't count on all of them behaving the same outside, perhaps, of HTML5 mode (which at least has specific rules for this). Of course if you're producing HTML with tools instead of writing it by hand, the tools should take care of this for you. That's the only reason that Wandering Thoughts has whatever HTML correctness it does; my DWikiText to HTML rendering code takes care of it all for me, <pre> blocks included.

Written on 05 June 2019.
« Go channels work best for unidirectional communication, not things with replies
Feed readers and their interpretation of the Atom 'title' element »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jun 5 21:03:51 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.