Something to remember: HTML forms are anonymous

July 15, 2011

By and large, web programming frameworks have settled on a common model of handling HTML forms. You have named (or typed) forms with named and typed fields and you use the framework to render them into HTML and extract them from POST responses. Django programmers, for example, have a familiar, reflexive idiom:

class MyForm(forms.Form):
  name = forms.CharField(...)

def handle_my_url(request):
  if request.method == "POST":
    form = MyForm(request.POST)
    if form.is_valid():

This simple, clear approach is misleading. It's misleading because it makes the whole process look sort of like storing objects, which means that of course you're only going to get a valid MyForm back from the HTTP POST if you actually put one there in the first place (or the user made up the POST data themselves to fool you, which you can ignore).

The thing is, HTML forms are anonymous. In their natural state, the only way you can tell different types of forms apart is by the URL they are sent to and the names of the form fields that they have. There is nowhere natural in a HTML form where you can say 'this is a MyForm form'; in general, you have to infer that from the fact that it has all the fields that MyForm has and is POST'd to a URL that expects a MyForm form.

(Your web framework may be adding a hidden label field that it uses to be sure, but you have to check the generated HTML in order to know for sure.)

So suppose that you have two different forms with the same form fields; this means that the only way to tell these forms apart is by the URL that each of them uses. If they use the same URL (for example because there are alternate versions of the page, with forms that have a different meaning), you can't tell them apart at all. You can render a page with a 'MyForm1', have the user POST it back, and happily retrieve a 'MyForm2' from the POST response. Although these two forms looked like they were distinct and different objects in your code, in HTML they are actually the same thing.

(It's as if your programming language ignored the type of things when doing 'is-a' and equivalence checks and only checked that two instances had all of the same fields. There are languages that work this way; I believe the term of art for it is 'structural equality'.)

All of this is abstract sounding, so let me give a concrete example where I almost shot my foot off this way. Our account request system allows privileged users to do two very special operations to requests: if a request is marked as having been either accepted or rejected, you can reset it to 'pending', and if a request is pending you can immediately delete it. In both cases you need to confirm that you really do want to do this by ticking off a checkbox, and both operations are done from the same 'detailed information about this request' URL; which option the page gives you depends on the request's state.

So we can create two forms:

class ReallyRevive(forms.Form):
  yes_really = forms.BooleanField(...)

class ReallyDelete(forms.Form):
  yes_really = forms.BooleanField(...)

Then we write code that tries to get and validate a ReallyRevive form from the POST response if the request is not pending, and do the same with a ReallyDelete form instead if the request is pending. And we have just created a dangerous race.

Suppose that two privileged users are both trying to revive the same request at the same time. Both see the page rendered with a ReallyRevive form, both tick the checkbox, and both submit the form, one somewhat after the other. In the first form submission, the code retrieves a valid ReallyRevive form and sets the request back to the pending state. In the second form submission, the code successfully retrieves a valid ReallyDelete form from the POST response despite the fact that it is actually a ReallyRevive response, and immediately deletes the just-revived request. Oops.

(You can see this as a REST violation if you want to. My view is that these things happen in practice so I should be aware of the bear traps waiting in the underbrush.)

The solution is to give your forms different field names; here we would have a really_revive checkbox in one form and a really_delete checkbox in the other.

Written on 15 July 2011.
« Ramblings on handling optional arguments in Python
Ubuntu, illustrating how to utterly fail at kernel security updates »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jul 15 01:32:48 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.