URL query parameters and how laxness creates de facto requirements on the web
One of the ways that DWiki (the code behind Wandering Thoughts) is unusual is that it strictly validates the query parameters
it receives on URLs, including on HTTP GET
requests for ordinary
pages. If a HTTP request has unexpected and unsupported query
parameters, such a GET
request will normally fail. When I made
this decision it seemed the cautious and conservative approach, but
this caution has turned out to be a mistake on the modern web. In practice, all sorts of sites will
generate versions of your URLs with all sorts of extra query
parameters tacked on, give them to people, and expect them to work.
If your website refuses to play along, (some) people won't get to
see your content. On today's web, you need to accept (and then
ignore) arbitrary query parameters on your URLs.
(Today's new query parameter is 's=NN', for various values of NN like '04' and '09'. I'm not sure what's generating these URLs, but it may be Slack.)
You might wonder how we got here, and that is a story of lax behavior
(or, if you prefer, being liberal in what you accept). In the
beginning, both Apache (for static web pages) and early web
applications often ignored extra query parameters on URLs, at least
on GET
requests. I suspect that other early web servers also
imitated Apache here, but I have less exposure to their behavior
than Apache's. My guess is that this behavior wasn't deliberate,
it was just the simplest way to implement both Apache and early
web applications; you paid attention to what you cared about and
didn't bother to explicitly check that nothing else was supplied.
When people noticed that this behavior was commonplace and widespread, they began using it. I believe that one of the early uses was for embedding 'where this link was shared' information for your own web analytics (cf), either based on your logs or using JavaScript embedded in the page. In the way of things, once this was common enough other people began helpfully tagging the links that were shared through them for you, which is why I began to see various 'utm_*' query parameters on inbound requests to Wandering Thoughts even though I never published such URLs. Web developers don't leave attractive nuisances alone for long, so soon enough people were sticking on extra query parameters to your URLs that were mostly for them and not so much for you. Facebook may have been one of the early pioneers here with their 'fbclid' parameter, but other websites have hopped on this particular train since then (as I saw recently with these 's=NN' parameters).
At this point, the practice of other websites and services adding random query parameters to your URLs that pass through them is so wide spread and common that accepting random query parameters is pretty much a practical requirement for any web content serving software that wants to see wide use and not be irritating to the people operating it. If, like DWiki, you stick to your guns and refuse to accept some or all of them, you will drop some amount of your incoming requests from real people, disappointing would be readers.
This practical requirement for URL handling is not documented in any specification, and it's probably not in most 'best practices' documentation. People writing new web serving systems that are tempted to be strict and safe and cautious get to learn about it the hard way.
In general, any laxness in actual implementations of a system can create a similar spiral of de facto requirements. Something that is permitted and is useful to people will be used, and then supporting that becomes a requirement. This is especially the case in a distributed system like the web, where any attempt to tighten the rules would only be initially supported by a minority of websites. These websites would be 'outvoted' by the vast majority of websites that allow the lax behavior and support it, because that's what happens when the vast majority work and the minority don't.
Comments on this page:
|
|