Setting the character encoding for HTML form input

September 16, 2011

Courtesy of reading HOWTO Use UTF-8 Throughout Your Web Stack, I recently rediscovered the form accept-charset attribute (I say 'rediscovered' because I clearly once knew about it since I mentioned it in passing back in 2007).

Setting an explicit accept-charset attribute on your forms solves (in theory) one of those niggling little HTML forms questions, that being 'what character encoding did the browser use for encoding this text the user submitted?' As spelled out in the HTML 4.01 Forms specification, browsers have to honor an explicit value. However, if one is omitted browsers are merely permitted (but not required) to take the character encoding for form submissions from the character encoding of the form's HTML page.

(I don't know why this wasn't mandatory behavior; maybe there were browsers that historically used their default character set or the like.)

According to various references, accept-charset is fully supported by browsers, with the small exception that if you try to use the charset 'ISO-8859-1' some versions of Internet Explorer will decide that you meant 'Windows-1252' instead. I haven't tested to see if an accept-charset that matches your page's charset will cause form submissions to have an explicit charset specified (cf), although I suspect that it doesn't for most browsers.

This isn't going to cause me to immediately update any of my HTML templates (either in DWiki or elsewhere); in practice they work today without specifying accept-charset, at least as far as I can tell. But when I write or update HTML in the future, I'm going to try to remember this and put accept-charset attributes on all of my HTML forms, just to make sure that I get what I'm expecting. It's a good practice, if nothing else, and someday it may save me some annoyance.

(As before, what HTML generally calls 'charsets' are in fact character encodings, not character sets per se.)

(This is one of the entries that I write to get something to stick in my head. Clearly I didn't think accept-charset was all that important back in 2007, and I'm pretty sure I was wrong.)

Written on 16 September 2011.
« How your Linux installer should help you set up filesystems
An operational explanation of Python metaclasses (part 2) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Sep 16 01:07:28 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.