Two-step updates: the best solution to the valid XHTML problem

December 12, 2008

Let us suppose that you want to create an environment that insures that XHTML stays valid XHTML. The corollary of how validation failures should punish the person actually responsible for them is that the site author should be punished for invalid XHTML, and you need to punish them directly.

So let's make the web server itself validate your XHTML; if it's not valid, it doesn't get served. Does this punish the site author? Not necessarily, because in order for the site author to notice they have to read their site; otherwise, again, the only people this is punishing are the readers, just like before, but this time the big BZZT is coming from the server instead of from the browser. (Arguably this makes it worse.)

So we have a couple of goals and needs. Clearly we need direct feedback to the author about invalid XHTML, and we want to give them no choice about getting it, which means that we need to force them to use our tool at some point when they update their website. We also want to make sure that readers are not punished due to invalid content (or at least are punished as little as possible), which means that we always need to serve up valid XHTML, which means that we can't allow in-place editing of the live page data.

Thus we want a two-step update process, where the author first prepares the update using whatever tools they want to use and then publishes it by running a tool provided by the web server. The tool validates all of the update and only applies it if validation succeeds; if it fails, the tool complains and stops, leaving the web server serving up the old (and valid) data.

(The author can still ignore the error and the lack of updates if they really want to. Ultimately there is nothing you can do to guarantee that people pay attention.)

A two-step update process can be implemented in a number of ways. The most straightforward is probably using a version control system with checkin guards, such that you can only check in a new version if it passes validation. 'Publishing' is then checking in and signaling the web server to get the new version from the VCS.

(However, if you want to make sure that users can't possibly edit the live data, you'll want to keep it in some sort of opaque database. Finding a justification for this is up to you.)

Comments on this page:

From at 2008-12-12 23:44:17:

Why not just treat invalid XHTML like invalid HTML? Give it your best shot and don't worry about it too much. (from the browsers perspective)

Like it or not, I have a feeling this is the only practical long term solution.

By cks at 2008-12-14 22:54:02:

This is a religious debate, but in general I agree with you; I don't think that strict XHTML validation can survive over the long term, for reasons I wrote up in XHTMLChicken. This entry was written from the perspective of the current strict-failure view of XHTML, and if you are going to strict failure I think it should be done this way so that it can more or less actually work.

Written on 12 December 2008.
« Why syndication feed readers (and web browsers) should fail gracefully
The pragmatic problem with strict XHTML validation »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Dec 12 22:44:20 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.