== Setting the character encoding for HTML form input
Courtesy of reading [[HOWTO Use UTF-8 Throughout Your Web Stack
http://rentzsch.tumblr.com/post/9133498042/howto-use-utf-8-throughout-your-web-stack]],
I recently rediscovered the form _accept-charset_ attribute (I say
'rediscovered' because I clearly once knew about it since I mentioned
it in passing [[back in 2007 POSTSpecifications]]).
Setting an explicit _accept-charset_ attribute on your forms solves (in
theory) one of those niggling little HTML forms questions, that being
'what character encoding did the browser use for encoding this text the
user submitted?' As spelled out in [[the HTML 4.01 Forms specification
http://www.w3.org/TR/html4/interact/forms.html]], browsers have to
honor an explicit value. However, if one is omitted browsers are merely
permitted (but not required) to take the character encoding for form
submissions from the character encoding of the form's HTML page.
(I don't know why this wasn't mandatory behavior; maybe there were
browsers that historically used their default character set or the
like.)
According to [[various
http://www.w3schools.com/tags/att_form_accept_charset.asp]]
[[references http://reference.sitepoint.com/html/form/accept-charset]],
_accept-charset_ is fully supported by browsers, with the small
exception that if you try to use the charset 'ISO-8859-1' some versions
of Internet Explorer will decide that you meant 'Windows-1252'
instead. I haven't tested to see if an _accept-charset_ that matches
your page's charset will cause form submissions to have an explicit
charset specified ([[cf POSTSpecifications]]), although I suspect that
it doesn't for most browsers.
This isn't going to cause me to immediately update any of my HTML
templates (either in DWiki or elsewhere); in practice they work today
without specifying _accept-charset_, at least as far as I can tell. But
when I write or update HTML in the future, I'm going to try to remember
this and put _accept-charset_ attributes on all of my HTML forms, just
to make sure that I get what I'm expecting. It's a good practice, if
nothing else, and someday it may save me some annoyance.
([[As before HTMLCharsets]], what HTML generally calls 'charsets'
are in fact character encodings, not character sets per se.)
(This is one of the entries that I write to get something to stick in
my head. Clearly I didn't think _accept-charset_ was all that important
[[back in 2007]], and I'm pretty sure I was wrong.)