I've surrendered on utm_* query parameters in URLs
I've written before about the various extra utm_* query parameters that show up on a lot of URLs these days (it turns out that they are apparently due to a Google product called Urchin). Back then I was optimistic that a change in the Planet Sysadmin Twitter feed would make them go away. It turns out, well, no such luck. These days, all sorts of things 'helpfully' add these query parameters to URLs that are shared through the various parts of the social web; in fact, my strong impression is that it's hard to share links without this happening. I've continued to see a steady dribble of such requests every day and while many of them are from various Twitter-related robots, some of them show every sign of being from real human beings.
(It looks as if any time a URL is mentioned for the first time in
anyone's Twitter stream some number of robots wake up and poke it,
I've disliked these query parameters every since I saw them. They're a hack and fundamentally wrong and they only work because almost every web server in the world is terribly sloppy (per the original discussion). I stuck to my guns on this for a long time. But. Real people are out there innocently using stuff that feeds them URLs with these unsightly query strings and trying to see my content, and I was giving them error pages instead. I don't care about robots, but I do care about people (eventually).
So I've surrendered. DWiki now accepts URLs with utm_* query parameters, no matter how annoyed this makes me.
However this isn't a complete surrender, as I'm handling this the right way. If you use a URL with these query parameters, you don't get the page itself. Instead DWiki immediately returns a redirection to the page's proper URL (ie the one without all of the ugly parameters that exist to either track your activities or inflate the nominal influence of various traffic sources, depending on who you ask). This removes all of the utm_ ugliness from the URL that people actually see and ensures that various sorts of web crawlers get the canonical URL for the page instead of seeing duplicate content across several different URLs.