URL query parameters and how laxness creates de facto requirements on the web

September 7, 2020

One of the ways that DWiki (the code behind Wandering Thoughts) is unusual is that it strictly validates the query parameters it receives on URLs, including on HTTP GET requests for ordinary pages. If a HTTP request has unexpected and unsupported query parameters, such a GET request will normally fail. When I made this decision it seemed the cautious and conservative approach, but this caution has turned out to be a mistake on the modern web. In practice, all sorts of sites will generate versions of your URLs with all sorts of extra query parameters tacked on, give them to people, and expect them to work. If your website refuses to play along, (some) people won't get to see your content. On today's web, you need to accept (and then ignore) arbitrary query parameters on your URLs.

(Today's new query parameter is 's=NN', for various values of NN like '04' and '09'. I'm not sure what's generating these URLs, but it may be Slack.)

You might wonder how we got here, and that is a story of lax behavior (or, if you prefer, being liberal in what you accept). In the beginning, both Apache (for static web pages) and early web applications often ignored extra query parameters on URLs, at least on GET requests. I suspect that other early web servers also imitated Apache here, but I have less exposure to their behavior than Apache's. My guess is that this behavior wasn't deliberate, it was just the simplest way to implement both Apache and early web applications; you paid attention to what you cared about and didn't bother to explicitly check that nothing else was supplied.

When people noticed that this behavior was commonplace and widespread, they began using it. I believe that one of the early uses was for embedding 'where this link was shared' information for your own web analytics (cf), either based on your logs or using JavaScript embedded in the page. In the way of things, once this was common enough other people began helpfully tagging the links that were shared through them for you, which is why I began to see various 'utm_*' query parameters on inbound requests to Wandering Thoughts even though I never published such URLs. Web developers don't leave attractive nuisances alone for long, so soon enough people were sticking on extra query parameters to your URLs that were mostly for them and not so much for you. Facebook may have been one of the early pioneers here with their 'fbclid' parameter, but other websites have hopped on this particular train since then (as I saw recently with these 's=NN' parameters).

At this point, the practice of other websites and services adding random query parameters to your URLs that pass through them is so wide spread and common that accepting random query parameters is pretty much a practical requirement for any web content serving software that wants to see wide use and not be irritating to the people operating it. If, like DWiki, you stick to your guns and refuse to accept some or all of them, you will drop some amount of your incoming requests from real people, disappointing would be readers.

This practical requirement for URL handling is not documented in any specification, and it's probably not in most 'best practices' documentation. People writing new web serving systems that are tempted to be strict and safe and cautious get to learn about it the hard way.

In general, any laxness in actual implementations of a system can create a similar spiral of de facto requirements. Something that is permitted and is useful to people will be used, and then supporting that becomes a requirement. This is especially the case in a distributed system like the web, where any attempt to tighten the rules would only be initially supported by a minority of websites. These websites would be 'outvoted' by the vast majority of websites that allow the lax behavior and support it, because that's what happens when the vast majority work and the minority don't.

Written on 07 September 2020.
« Daniel J. Bernstein's IM2000 email proposal is not a good idea
Why Fedora version upgrades are complicated and painful for me »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Sep 7 00:17:18 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.