== HTML character sets
I started with a simple question that was vaguely orbiting in the back
of my mind: are HTML numeric character entities always in Unicode,
regardless of the character set of the HTML web page? The answer is
yes, and that I was asking a misleading question.
Despite being called 'charset' in HTTP headers and META declarations,
what you are declaring is actually the web page's character encoding,
not its character set. HTML is specified as always being in the Unicode
character set, although the encoding of characters in the document can
vary, so numeric character entities are always in Unicode.
(All of this can be found in the [[W3C spec
http://www.w3.org/TR/html401/charset.html]], which is even relatively
clear.)
Browsers display HTML by at least logically converting the incoming
web page into Unicode and then figuring out how to render all of the
characters. If you do not have Unicode fonts for everything, Firefox
will hunt around through the fonts that you do have in various character
set encodings, using its charset to Unicode maps in reverse to find one
that has the necessary Unicode character. Interesting things happen if
the fonts you have do not have all the characters that Firefox expects
them to have.
(This is probably not an issue to people who are using stock Firefox
builds with relatively stock fonts on modern Unix systems. I use a
custom-compiled Firefox with a wacky set of old-school bitmap fonts as
my default fonts, so periodically various 'smart quote' characters drop
out on me and I get to go on another hunting expedition.)