One of XHTML's practical problems was its implications for web page generation

October 28, 2019

I recently ran across The evolution of the web, and a eulogy for XHTML2, which has a much more positive view of XHTML(2) than I do; my view is not positive at all. In the ensuing discussion on lobste.rs I realized a new aspect of the practical problems with XHTML, which is the page creation side.

(My usual XHTML objections focus on the web user side of things, where XHTML's nominal requirement for draconian error handling (any XHTML errors would cause browsers to show you nothing of the page) clashed badly with practical usability, especially as people demonstrably mostly didn't write correct XHTML. A web full of error pages is not a good web.)

Because the consequences of invalid XHTML are so severe, XHTML and the W3C were essentially demanding that everyone change how they created web pages so that they only created valid XHTML. For individually created web pages, ones authored by people (and thus in moderate volume), this is theoretically not a huge problem; people can be pushed to run XHTML validators before they publish, or use XHTML aware editing environments that don't let them make mistakes in the first place.

It is a huge problem for dynamically generated web pages, though, or more exactly for the software that does it. Put simply, text templating is not compatible with XHTML in practice (partly because there are a lot of ways to go wrong in XHTML). At scale, the only safe way to always end up with valid XHTML is to use a page generation API that simply doesn't allow you to do anything other than create valid XHTML. Almost no one generating dynamic pages uses or used such an API, which meant that switching to XHTML would have meant modifying their software at some level.

(A page generation system that throws an error when you generate an invalid XHTML page isn't good enough. From Amazon's perspective, it doesn't matter whether it was the user's browser or their page rendering system that caused a product page to not display; either is bad.)

Since XHTML got web sites nothing in practice, no one of any size was ever likely to do this. And even by the late 00s, more and more web sites were using more and more automatically generated pages. Even today a very large number of automatically generated pages are done through text templating systems, which are and remain very popular in things like (server side) web frameworks.

(I maintain that there are very good reasons for this, but that's for another entry.)


Comments on this page:

By sam at 2019-10-29 02:48:22:

This isn't wrong, but stringly-typed page generation is also typically vulnerable to XSS; I don't know if I'd go so far as to call it deprecated today, but I would definitely consider it not good practice.

By cks at 2019-10-30 12:38:38:

The templating systems I'm familiar with are dealing with XSS and related issues by auto-escaping strings when they're inserted into the template, with some way of marking strings as 'safe' (and of inserting things that aren't strings). This isn't a complete solution by itself, since you need different sorts of escaping depending on the context, but it seems to get the job done well enough for people.

Written on 28 October 2019.
« An interesting little glitch in how Firefox sometimes handles updates to addons
Netplan's interface naming and issues with it »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Oct 28 21:31:32 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.