== Writing HTML considered harmful
The other day a weblog I read had a
[[post|http://blog.centresource.com/2006/01/05/the-evolution-of-a-programmer]]
(and their RSS feed) blow up because of invalid markup: an entry was
quoting some text that had bare '<'s in it, nothing escaped them, and
invalid HTML tags got generated and ate part of the entry. (The
problem's now been fixed.)
There's nothing noteworthy about this, and that's the problem: people
make this mistake all the time. HTML has a bunch of picky rules to
keep track of; if you make people write HTML they will overlook
something every so often, kaboom. The conclusion is obvious.
Not escaping a '<' is the most common error, so if you're going to
make people write HTML ~~please automatically escape all unrecognized
HTML tags~~. This gives you a fighting chance of not mangling your
user's text too badly the next time they paste in something with a
'_#include _' or whatever. (Please especially do this if
you're already only accepting limited HTML markup, for example in
comments.)
The real solution is to use a markup language that's easier to write
and avoids these errors. There's lots of choices; wikis have shown
that people will happily write quite a lot in WikiText variants, for
example. While these don't give you all of HTML's power, content text
rarely needs more than the core markup, and in any case if you're
editing through the web there's a limit on what you can write by hand
and get right.
You might say 'well, people shouldn't make that error' (or 'people
should preview and notice the error and fix it'). Don't. When people
make a mistake all the time the error is in not in the people, it's in
the interface. (You can maintain otherwise, but you are trying to swim
upstream against a very, very strong current.)
=== Sidebar: but what about accepting unrecognized tags?
Accepting and ignoring unrecognized HTML markup is a great thing for
a browser, but it's almost always the wrong thing for a simple
authoring environment. For the rare times that your users need to put
weird new HTML tags in, have an override option. If you're worried
about new HTML tags becoming common, just let people add new HTML tags
to the list of known ones.