Stupid web spammer tricks

September 5, 2006

I'd call these stupid spider tricks, except that these were so visibly committed by web spammers. (In one case they left me their spam, clearly visible.)

  • you cannot take a POST form's form elements, turn them into query parameters, and then try to GET the result.
  • especially if you remove the existing query parameter on the URL of the form's target.
  • you get modest bonus points if you POST your query parameter laden URL instead of GET'ing it. Not enough bonus points to make it work, though.

I have to admire the determined necessary to carefully program your software to do stuff like this. Or, alternately, the gleeful blindness required to ignore the fact that there are two ways of submitting form data, and just implementing the easier one and using it for everything. (In this view, the POST to GET person is at least being consistent; his software may not implement POST at all.)

The existence of these things depress me, because the fact that the web comment spammers do them suggests that they actually work against some blog software. And that's just sad, but then a lot of web software (starting with Apache) is very sloppy about this stuff.

(Accepting POST requests in GET form is especially bad because it opens you up to lovely cross-site attacks if I can so much as persuade you to click on a link. If you think this is obscure, consider how it could be combined with cross-site authentication like OpenID to let it be targeted. Add JavaScript, and I probably don't even need to get you to explicitly click something.)

Comments on this page:

From at 2006-09-06 02:31:51:

you cannot take a POST form's form elements, turn them into query parameters, and then try to GET the result.

You might be surprised about this -- PHP introduced the $REQUEST superglobal a while back, which behaves like a merged array of the $POST and $_GET superglobals, containing all the combined information.

On many (well, okay, some) PHP web scripts, I'd venture a guess that a GET request will in fact work similarly to a POST..

From at 2006-09-06 02:32:46:

Okay, I will actually read more than one sentence of your post before commenting next time, I promise.

By cks at 2006-09-06 15:42:51:

<Insert rant about PHP encouraging bad web programming habits>

I really think that a framework or web programming language should relatively strongly separate the GET and POST mechanisms, just because it's a lot safer for everyone concerned that way. (Look at how much fun the Ruby on Rails people had with Google Web Accelerator.)

This has little to do with the intellectual purity of REST and idempotent GETs (although they're important too), and everything to do with pragmatically avoiding explosions. There is such a thing as making things too easy.

By DanielMartin at 2006-09-25 13:22:41:

You know, occasionally even the big guys get this wrong, with stupid consequences. GET is explicitly specified to have the semantics of pure information retrieval; it's been in the specs since HTTP 1.0 that server implementors need to be aware that any side-effects of a GET request are totally unintentional. (and that the user should not be blamed for or suffer the consequences of any GET request) One reason - which I haven't seen mentioned anywhere - is that a user's browser can be told to execute multiple other GET requests by any html page. (via the IMG or OBJECT tags)

Now as an example of where a really big name gets it wrong, this link will take you to my homepage:

After clicking on it, go do a google search. Notice the text around the number of search results and elsewhere (or go to the google homepage).

That's right, your google language preferences have been borked. From what I can tell, this works even with image tags. Combine this with fora that allow people to post images, and you get to screw with anyone who views the forum page, even if they have javascript completely turned off. Return only a single result per page, in a new window, and only search for pages in Arabic? Done, and all from pulling up a page in your browser.

Written on 05 September 2006.
« A thought about interactive development environments
How fast an LCD refresh rate is going to be fast enough? »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Sep 5 21:44:08 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.