Caution is a mistake in modern web servers and apps

November 4, 2016

When I wrote DWiki (the code behind Wandering Thoughts), I made it pretty cautious and conservative about what it accepted instead of rejecting, and how it handled any number of things. POSTs to GET-only URLs, or GETs of POST-only URLs? Rejected with errors. Unexpected query parameters on GETs? Rejected with an error. An If-Modified-Since time that didn't exactly match the resource's current modification timestamp? Well, better declare that a conditional GET miss and give you the full resource (cf). And so on.

If I was doing a web app from scratch today, I wouldn't do that. Send me the wrong sort of operation or a nonsensical one that I don't understand? Send me query parameters I've never heard of? Whatever, have a HTTP redirection to the canonical URL. Maybe this is not what you actually wanted, but if so it's not my problem; you're the one sending the broken request, and I'm giving you an answer that's more useful (to a person) than a 4xx or a 5xx. Send me a random If-Modified-Since? I'll do a time-based comparison on it if I can, and if that results in you not realizing there's a new version of the page, well, you could have used an ETag based check instead.

You might wonder why I say this. Well, sadly it's simple. The reality is that on the modern web, being cautious about what you accept is a mistake.

The modern web is a mess in practice and part of that mess is that people will cheerfully write and distribute software that shoves all sorts of crazy, sloppy, stupid things at your server. Some times it's just an accident, which goes unnoticed and unfixed for the usual reason (namely that almost nothing else notices or complains). Some times it's a deliberate choice because they can usually get away with it and use it for something useful (to them). Some times it's folklore that people are blindly following. And honestly, it doesn't really matter why it happens, just that it does and it affects real software used by real people.

(Some of it is the malicious attempting to attack your server, but so what? Everyone gets attacked all the time on today's Internet.)

As you might guess, there is a precipitating incident that led me to write this entry. To wit, today I saw some POST requests (to GET-only pages) with the content type of text/ping. This is apparently a new (proposed) standard. Yes, really.

I could write a lot here, as I have in the past (I'll spare you links to the entries), but there's no point. I give up. You win, modern web, or more exactly you have simply run everyone over like a giant lawnmower. Given that there are browsers out there implementing this today and sending these POST requests to my pages here, my views are irrelevant.

(With that said, I don't have any plans to change DWiki's current cautious behaviors. That would be more work than just leaving it alone.)


Comments on this page:

Hmm, browsers shouldn't care if the ping request returns an error; the result will be discarded anyway.

All else being equal, an error result should be better than a redirect. The draft refers to the same spec used to fetch page resources, which implies redirects should be followed. No redirect -> less erroneous traffic.

Presumably the reason it isn't equal, is it cluttered your error logs with yet another puzzle :(. Dunno know what the best way to ignore it would be.

I've seen <a ping> mentioned before, but I'm surprised sites are pointing the ping at your pages.

I would claim POST to a GET-only URL deserves an error, since the server is potentially throwing away a whole message. (Also GET is now supposed to be "safe", so the example of ignoring a GET parameter doesn't involve quite the same severity of data loss).

OTOH I suppose you could say it's only a convenience for developers. In which case, if you don't find it worthwhile as the site developer, it's your prerogative to choose which errors to generate, and which not to.

By skeeto at 2016-11-04 07:56:02:

Fortunately, it looks like the ping attribute was rejected back in 2010, no longer appearing in more recent HTML5 specifications. Despite this, WebKit browsers still support it, and Firefox has disabled-by-default support. I'd say I'm surprised such a weird feature made it as far as it did, but it's not nearly the strangest thing to come out of web standards.

By Daniel at 2020-09-13 12:59:49:

Pardon me for commenting to a four-year-old blog entry.

After reading the relevant standard behind this "ping", it is not clear to me why this is considered a weird, or undesirable, standard, by you and other commenters.

If my understanding is correct, it is intended to facilitate the use-case of some platforms (Google search off the top of my head) who want to track which outgoing links are "popular" with visitors. For a search engine, that seems reasonable - one's opinions on privacy and tracking notwithstanding, - and there are provisions built into the standard for UA's to disregard the ping= attribute.

The standard might have failed to be adopted, but the reality is that everyone who wants to track outgoing links still does so regardless, employing even more undesirable techniques in the process, such as JS rewriting, being directed to tracking URL-then-being-redirected, etc.

How someone ended up writing an hyperlink, on their webapge, with a ping= attribute that resolved to yours (that's the only way you'd see those entries in your webserver log?) eludes me, though.

On the other thoughts, that the current web ecosystem is a mess best touched with lead-lined gloves lest one catch an unpleasant disease, I wholly agree.

Written on 04 November 2016.
« I suspect that browsers are not fully prepared for bad CAs
Web pages versus APIs, or my views on handling 'bad' requests »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Nov 4 00:52:14 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.