Web analytics versus GET parameter security

March 20, 2010

I have recently run into an interesting collision between typical web analytics practices (and applying this to random URLs) and good security and robustness. The straightforward manifestation is that links to WanderingThoughts entries from the Planet Sysadmin Twitter feed don't work; trying to follow one gets you remarkably terse error messages from DWiki (the software behind WanderingThoughts).

DWiki is very cautious. One of the ways that this manifests is that it doesn't accept random query parameters on requests; it knows what query parameters each URL accepts, and anything else is an error. I maintain that this is both secure and robust; certainly my logs have a constant parade of attempts to exploit the willingness of bad PHP applications to accept additional random (and, as it turns out, dangerous) query parameters. The abrupt error messages are happening because of extra query parameters.

The extra query parameters aren't directly visible in the URLs in the Twitter feed, which uses bit.ly to shorten the URLs, and they aren't in the original form of the entries on Planet Sysadmin. Instead the shortening process is adding them on.

The extra query parameters are always '?utm_source=twitterfeed&utm_medium=twitter'. Some web searching suggests that this sort of query parameter is added to URLs so that JavaScript based, on-page web analytics packages can track the source of inbound links (I think especially advertising based stuff, but I'm not sure). This hijacking of query parameters for web analytics does require that your target application ignore random extra query parameters; as this incident nicely illustrates, this is not an assumption that you can or should make in general, for other people's URLs (and you'll want to test it for your own web application).

(I suspect that the direct culprit is twitterfeed's analytics features, which I further suspect are enabled by default.)

PS: I've let the Planet Sysadmin people know about this, so it'll presumably get fixed at some point. Assuming that twitterfeed and all of the other moving parts involved in this allow you to turn it off.

Comments on this page:

From at 2010-03-20 08:24:09:

It's a reasonably safe assumption that random websites ignore unknown GET parameters. What examples of sites beyond your own don't?

By cks at 2010-03-22 14:29:13:

I don't specifically know of anything besides my own code, but I haven't particularly been looking for examples. I'd like to hope that I'm not the only cautious and conservative person in the world, but maybe I am.

(PermissiveWebApps covers why I think the conservative approach is the right one, or at least was until people started exploiting the situation to add extra information to requests.)

Written on 20 March 2010.
« The problem with general purpose languages as configuration languages
The power of 'I like this' in social applications »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Mar 20 03:13:48 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.