Writing HTML considered harmful
The other day a weblog I read had a post (and their RSS feed) blow up because of invalid markup: an entry was quoting some text that had bare '<'s in it, nothing escaped them, and invalid HTML tags got generated and ate part of the entry. (The problem's now been fixed.)
There's nothing noteworthy about this, and that's the problem: people make this mistake all the time. HTML has a bunch of picky rules to keep track of; if you make people write HTML they will overlook something every so often, kaboom. The conclusion is obvious.
Not escaping a '<' is the most common error, so if you're going to
make people write HTML please automatically escape all unrecognized
HTML tags. This gives you a fighting chance of not mangling your
user's text too badly the next time they paste in something with a
#include <stdio.h>' or whatever. (Please especially do this if
you're already only accepting limited HTML markup, for example in
The real solution is to use a markup language that's easier to write and avoids these errors. There's lots of choices; wikis have shown that people will happily write quite a lot in WikiText variants, for example. While these don't give you all of HTML's power, content text rarely needs more than the core markup, and in any case if you're editing through the web there's a limit on what you can write by hand and get right.
You might say 'well, people shouldn't make that error' (or 'people should preview and notice the error and fix it'). Don't. When people make a mistake all the time the error is in not in the people, it's in the interface. (You can maintain otherwise, but you are trying to swim upstream against a very, very strong current.)
Sidebar: but what about accepting unrecognized tags?
Accepting and ignoring unrecognized HTML markup is a great thing for a browser, but it's almost always the wrong thing for a simple authoring environment. For the rare times that your users need to put weird new HTML tags in, have an override option. If you're worried about new HTML tags becoming common, just let people add new HTML tags to the list of known ones.