Setting the character encoding for HTML form input
Courtesy of reading HOWTO Use UTF-8 Throughout Your Web Stack,
I recently rediscovered the form accept-charset attribute (I say
'rediscovered' because I clearly once knew about it since I mentioned
it in passing back in 2007).
Setting an explicit accept-charset attribute on your forms solves (in
theory) one of those niggling little HTML forms questions, that being
'what character encoding did the browser use for encoding this text the
user submitted?' As spelled out in the HTML 4.01 Forms specification, browsers have to
honor an explicit value. However, if one is omitted browsers are merely
permitted (but not required) to take the character encoding for form
submissions from the character encoding of the form's HTML page.
(I don't know why this wasn't mandatory behavior; maybe there were browsers that historically used their default character set or the like.)
According to various
references,
accept-charset is fully supported by browsers, with the small
exception that if you try to use the charset 'ISO-8859-1' some versions
of Internet Explorer will decide that you meant 'Windows-1252'
instead. I haven't tested to see if an accept-charset that matches
your page's charset will cause form submissions to have an explicit
charset specified (cf), although I suspect that
it doesn't for most browsers.
This isn't going to cause me to immediately update any of my HTML
templates (either in DWiki or elsewhere); in practice they work today
without specifying accept-charset, at least as far as I can tell. But
when I write or update HTML in the future, I'm going to try to remember
this and put accept-charset attributes on all of my HTML forms, just
to make sure that I get what I'm expecting. It's a good practice, if
nothing else, and someday it may save me some annoyance.
(As before, what HTML generally calls 'charsets' are in fact character encodings, not character sets per se.)
(This is one of the entries that I write to get something to stick in
my head. Clearly I didn't think accept-charset was all that important
back in 2007, and I'm pretty sure I was wrong.)
|
|