Where to find specifications on HTTP POST behavior

September 5, 2007

Some IP addresses (probably not friendly ones) have recently taken to making POST submissions to various 'write comments' URLs here with a Content-Type of 'application/x-www-form-urlencoded; charset=UTF-8'. These get rejected by DWiki, because I was quite paranoid when I wrote the POST handling code and so DWiki is quite conservative on what it will accept.

While I was pretty certain that I wasn't losing anything by rejecting these requests, I did get curious to find out if adding a character set to a form POST content-type this way is actually legal, which meant that I wanted to run down where this is actually specified.

(In general including a charset in the content-type on POST is unambiguously allowed by the HTTP specification, so the only question is whether you are allowed to do it specifically in HTTP form POSTs.)

The primary specification of form POST behavior is in the HTML 4.01 specification, which should not have surprised me but did (I looked at the HTTP spec first). Section 17.13.3 describes the process of submitting a form, but you also need 17.13.4 and the definition of the enctype attribute. Unfortunately this doesn't clearly answer the question, since the specification uses very general language.

However, I think that adding a charset parameter has to be allowed by implication. Forms may specify that the server can accept more than one character encoding and leave it up to the client to decide which one to use (the accept-charset <form> attribute). This implies that the client must tell the server which character set it picked, and the form encoding rules provide no place to put this except as a charset parameter on the POST's Content-Type.

(Browsers are encouraged to interpret a missing accept-charset as implying the character set of the HTML page with the form, which is UTF-8 in the case of WanderingThoughts. However, including a charset at all in this case is vanishingly rare.)

I'm still not going to fix DWiki's code right away, since I want to think through what I can and should do if the character set doesn't match. (Bearing in mind that my tolerance for people playing weird HTTP and HTML games is fairly low, since most of them are up to no good.)

Written on 05 September 2007.
« Features that I wish ZFS had
When you don't want RAID-5 »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Sep 5 23:19:15 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.