Writing HTML considered harmful

January 14, 2006

The other day a weblog I read had a post (and their RSS feed) blow up because of invalid markup: an entry was quoting some text that had bare '<'s in it, nothing escaped them, and invalid HTML tags got generated and ate part of the entry. (The problem's now been fixed.)

There's nothing noteworthy about this, and that's the problem: people make this mistake all the time. HTML has a bunch of picky rules to keep track of; if you make people write HTML they will overlook something every so often, kaboom. The conclusion is obvious.

Not escaping a '<' is the most common error, so if you're going to make people write HTML please automatically escape all unrecognized HTML tags. This gives you a fighting chance of not mangling your user's text too badly the next time they paste in something with a '#include <stdio.h>' or whatever. (Please especially do this if you're already only accepting limited HTML markup, for example in comments.)

The real solution is to use a markup language that's easier to write and avoids these errors. There's lots of choices; wikis have shown that people will happily write quite a lot in WikiText variants, for example. While these don't give you all of HTML's power, content text rarely needs more than the core markup, and in any case if you're editing through the web there's a limit on what you can write by hand and get right.

You might say 'well, people shouldn't make that error' (or 'people should preview and notice the error and fix it'). Don't. When people make a mistake all the time the error is in not in the people, it's in the interface. (You can maintain otherwise, but you are trying to swim upstream against a very, very strong current.)

Sidebar: but what about accepting unrecognized tags?

Accepting and ignoring unrecognized HTML markup is a great thing for a browser, but it's almost always the wrong thing for a simple authoring environment. For the rare times that your users need to put weird new HTML tags in, have an override option. If you're worried about new HTML tags becoming common, just let people add new HTML tags to the list of known ones.

Written on 14 January 2006.
« An unconventional reason for large RAID stripe sizes
Weekly spam summary on January 14th, 2006 »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Jan 14 04:03:55 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.