Who or what your website is for and more on HTTP errors

August 5, 2013

Aristotle Pagaltzis commented on my entry about the pragmatics of HTTP errors and I want to reply to a few things.

First off, I want to say that I fully agree with Aristotle's characterization that the real question to ask is what the practical effects of using any particular status code will be. This is an excellent way of putting it and if I'd been clever enough to think of it I would have framed my entire (accidental) series around this question.

A "website used by people" is really a discoverable, self-documenting web API. The distinction isn't between whether they are used by people or programs, it is really only whether the primary generated content format is more human- or more machine-friendly (HTML vs JSON, say).

I disagree with this on a philosophical basis. A website used by people can be treated as a discoverable web API but it is not one; it has not been designed as one and it probably won't evolve as one. To put it one way, people will read and machines won't. A real web API needs machine parseable results (including HTTP error codes), stability, versioning, and a bunch of other things. A website designed for people is unlikely to have those (for good reasons).

(Yes, search engines parse HTML and that's a good thing. But I think that this is worlds away from an actual API.)

I think that this distinction is important to draw because it drastically shapes how your web application responds to errors (at least for general errors). If you really are creating an API then you need to somehow make the responses machine-parseable and unambiguous, which may even require making up your own new HTTP error codes (or the less extreme version of embedding an additional status header with more details in the HTTP response). If you're creating a web application for people what matters is what people will read; actual HTTP error codes are important only for their effects (if any) on caches, web crawlers, and so on if you care about any of those.

(You may not. A web application that is used over HTTPS and only interacts with authenticated users makes caches and web crawlers irrelevant.)

In response to my hypothetical of getting a DELETE for a non-existent URL when your application doesn't even support DELETE, Aristotle gave the web-standards-correct answer:

There is no resource at the URL in the request, so 405 would be sort of perverse to respond with. I'd argue against 403 on similar grounds but couldn't really object to it. Between 404 and 501 it's a toss-up though.

Here is where web standards run into security engineering. If your application doesn't support DELETE at all, the wise thing to do is to reject all DELETE requests out of hand before you attempt to parse the URL, decode query arguments, and so on. Often this is also by far the easiest thing. There is also a strong argument that the specific error code chosen (and the text that accompanies the HTTP response) should be as uninformative as possible, since anyone who tries a DELETE against your web app is trying a destructive operation that you do not support at all.

In general when web clients are attempting something which you don't support, have never advertised, and that can't be an innocent mistake I think that you have almost completely free license to do whatever is convenient from a programming or security standpoint. People who are trying to rattle the doorknobs or even kick the door in do not get any courtesies.


Comments on this page:

From 87.79.78.105 at 2013-08-06 07:49:06:

A real web API needs machine parseable results (including HTTP error codes), stability, versioning, and a bunch of other things.

This is ground covered in Fielding’s dissertation. Mostly it needs content formats that are at once fine-grained and coarse-grained enough.

Versioning, if you refer to URL structures, is at best irrelevant, and easily leads to tight coupling. The whole and sole point of ReST is to be able to evolve both clients and servers independently from each other.

Stability, well that’s something human-oriented websites profit from no less than machine-oriented ones do.

A website designed for people is unlikely to have those (for good reasons).

I disagree that human-oriented websites do not require good status codes. Sure, they do not need them, but they will interoperate badly with the web at large. And as is true at all levels when it comes to the web: if you screw the pooch on interop, things will work badly but not not-at-all. It is very resilient. Is that a good reason to say “screw it”? Your call.

If you really are creating an API then you need to somehow make the responses machine-parseable and unambiguous, which may even require making up your own new HTTP error codes (or the less extreme version of embedding an additional status header with more details in the HTTP response).

… or returning them in some machine-readable way inside the body… remember the body? HTTP messages have those too, you know. :-)

Inventing new status codes should really be a measure of very last resort, because you are subsetting HTTP. New headers are slightly less bad, but still bad. Most people who do either really need neither… everyone just thinks their app or protocol is special.

If you're creating a web application for people what matters is what people will read; actual HTTP error codes are important only for their effects (if any) on caches, web crawlers, and so on if you care about any of those.

I am not sure how this is supposed to be an argument about anything. There is no contradiction nor any restriction from either part on the other. Both of those things are important.

Maybe the argument you are trying to make is that “interoperability is not the same as usability”? In which case the response is obviously “of course not”.

And if the argument you have been trying to make is something like “interoperability only matters for machine-oriented websites, whereas only usability matters for human-oriented websites”, well, when put that way, I think the answer to that is obvious.

Here is where web standards run into security engineering.

Do they? What I said (or at least meant! I may have failed to communicate) is that in terms of HTTP, all of 403, 404 and 501 would be equally valid choices in your scenario, so basing your choice among them on criteria outside HTTP itself and then picking any one of them is fine. The exception I made about 405 does not seem to conflict with you security concern either, since 405 is the most specific and informative status code among the ones you listed – so to the degree I understand your argument, you wouldn’t want to return that one anyway.

Aristotle Pagaltzis

By cks at 2013-08-06 10:22:34:

I want to mull over most of your comment and come up with a coherent reply, but on the issue of the error code example here's my thinking:

  • 501 is inappropriate because it implies that this is a server failure instead of a client failure. (Although I think it's very common to return 501s here.)
  • returning 404 for all DELETE requests is technically incorrect if some of the URLs exist (although it is justifiable under the '404 is a generic error' view).
  • returning 405 does actually tell the client what it's doing wrong: it's making an unsupported request. This is technically true even if the URL doesn't exist, and is applicable to even, eg PUT requests (which don't necessarily require the target URL to exist in advance).

My view on error codes in response to request from nasty people is that you can do whatever you want, including what's most convenient or what you feel is most secure. But I still think you can decide to do something that 'makes sense' in your view if you want to.

I like 405 as a response partly because it's also applicable for more innocent requests like OPTIONS or TRACE.

From 87.79.78.105 at 2013-08-07 02:52:30:

501 is inappropriate because it implies that this is a server failure instead of a client failure. (Although I think it's very common to return 501s here.)

Why? :-) Because that is what 5xx is specified to mean in the spec? If I may remind you:

First off, I want to say that I fully agree with Aristotle's characterization that the real question to ask is what the practical effects of using any particular status code will be.

You said that. ;-)

So, OK. As the server’s author you don’t want to “accept the blame”… but I ask, what will clients do in response to 501?

This is not a rhetorical question, mind you – I really do not know. I think clients by and large are likely to behave in the way you’d like them to, namely, to cease trying (at least in short order – likely not immediately if the intent is malicious). But while I know how some of the other status codes get treated in some of the HTTP infrastructure, I don’t know much about 501. (Nor, for that matter, if shutting up the malevolent client is actually what you want.)

It would be useful to have some data on this.

Aristotle Pagaltzis

Written on 05 August 2013.
« What's changed in Unix networking in the last decade or so
Understanding how generators help asynchronous programming »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Aug 5 22:28:50 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.