The HTML <pre> element doesn't do very much
These days I don't do too much with HTML, so every so often I wind
up in a situation where I have to reach back and reconstruct things
that once were entirely well known to me. Today, I wound up talking
with someone about the <pre>
element and what you could and
couldn't safely put in it, and it took some time to remember most
of the details.
The simple version is that <pre> doesn't escape markup, it only changes formatting, although many simple examples you'll see only use it on plain text so it's not immediately clear. Although it would be nice if <pre> was a general container that you could pour almost arbitrary text into and have it escaped, it's not. If you're writing HTML by hand and you have something to put into a <pre>, you need to escape any markup and HTML entities (much like a <textarea>, although even more so). Alternately, you can actually use this to write <pre> blocks that contain markup, for example links or text emphasis (you might deliberately use bold inside a <pre> to denote generic placeholders that the reader fills in with their specifics).
As with <textarea>, it's easy to overlook this
for straightforward cases and to get away without doing any text
escaping, especially in modern browsers. A lot of the command lines
or code or whatever that we often put into <pre> don't contain
things that can be mistaken for HTML markup or HTML entities, and
modern browsers will often silently re-interpret things as plain
text for you if they aren't validly formatted entities or markup.
I myself have written and altered any number of <pre> blocks over
the past few years without ever thinking about it, and I'm sure
that some of them included '<
' or '>
' and perhaps '&
' (all
as part of Unix command lines).
(The MDN page on <pre> includes an example with unescaped < and >. If you play around with similar cases, you'll probably find that what is rendered intact and what is considered to be an unrecognized HTML element that is silently swallowed is quite sensitive to details of formatting and what is included within the '< ... >' run of raw text. Browsers clearly have a lot of heuristics here, some of which have been captured in HTML5's description of tag open state. In HTML5, anything other than an ASCII alpha after the '<' makes it a non-element (in any context, not just in a <pre>).)
I don't know how browser interpretation of various oddities in <pre> content is affected by the declared or assumed HTML DOCTYPE or HTML version the browser assumes, but I wouldn't count on all of them behaving the same outside, perhaps, of HTML5 mode (which at least has specific rules for this). Of course if you're producing HTML with tools instead of writing it by hand, the tools should take care of this for you. That's the only reason that Wandering Thoughts has whatever HTML correctness it does; my DWikiText to HTML rendering code takes care of it all for me, <pre> blocks included.
Comments on this page:
|
|