The pragmatic issues around HTTP error codes mattering (or not)

August 3, 2013

When I posed the question of whether specific HTTP error codes actually mattered I put the question rather abstractly. But it's really a pragmatic question which I can put this way: how much effort is it worth putting into a web application that is used by people to get your HTTP error codes exactly and completely 'right'?

(I'm biased towards the server perspective because that's what I write, but if you write HTTP clients there's a mirror image question of how much sense it makes to write code that takes action based on fine distinctions in the response codes you get.)

I'll use DWiki (the ball of code behind my techblog) as an example. Broadly speaking DWiki can generate what are conceptually errors when people feed it all sorts of bad or mangled requests, when they ask for URLs that do not exist, when there are internal errors in page rendering (such as a bad template), when they are people we don't like, and when they don't have permission to access the URL. Today DWiki responds to almost all of these situations with either a 403 or a 404 status code, and some permission failures don't generate errors at all (instead you get a page with a message about it). Usually (but not always) DWiki generates 404 errors if the problem is something that could plausibly happen innocently and 403 errors otherwise.

(It would be nice to generate 403 errors for all permission denied situations but DWiki's architecture makes it quite hard for reasons that don't fit in the margins of this entry.)

Could DWiki do 'better', whatever that means? Perhaps. It could use error codes 400, 405, and maybe 505 in some situations, but these are all around the edges. Some uncommon issues should perhaps produce some 5xx error instead of a 404 because they are really a server-side problem with the page.

(DWiki also punts completely on a lot of semi-impossible situations. For example it assumes that either all clients can handle any format it wants to return or that the real web server will worry about checking for this and generating 406 errors when appropriate.)

I could go through every error generation in DWiki along with the list of HTTP status codes and try to match up each error with the closest match (which is often not very close; there are very few HTTP codes for relatively generic errors). But it very much appears to me that this would be both a lot of work for very little gain and also quite subjective (in that what I think is the right error code for a situation might not match up with what someone else expects).

Note again that this is for typical web apps and websites that are used by people instead of web APIs that are used by programs. Web APIs need to have as much thought put into error codes as any API needs into its error responses. But again, the best way to communicate most errors to API clients may not be through the very limited channel of HTTP error codes.

Sidebar: the subjectivity of HTTP error codes

Here are three examples of what I mean by subjectivity in HTTP error code choice.

  • A client requests a non-directory page but puts a '/' on the end of the URL (eg '/page/' instead of '/page'). Is this 400, 404, ignored (with the plain page being served), produces a redirection to the slashless version, or something else?

  • The server validates pages in some way before serving them to clients and a requested page exists but fails to validate. Is this 500, 503, 404, or something else?

    (Note that 404 is basically the generic error response and is documented as such in the HTTP/1.1 RFC.)

  • A client does a DELETE for a URL that doesn't exist and your web app doesn't support DELETE; in fact it only does plain old HEAD, GET, and maybe POST. Bearing in mind that this is extremely unlikely to be just an innocent mistake, is this 405, 501, 404, 403, or something else?

If you poll a bunch of web developers I think you will get a bunch of different answers for all of these.


Comments on this page:

From 87.79.78.105 at 2013-08-04 18:32:33:

Note again that this is for typical web apps and websites that are used by people instead of web APIs that are used by programs.

A “website used by people” is really a discoverable, self-documenting web API. The distinction isn’t between whether they are used by people or programs, it is really only whether the primary generated content format is more human- or more machine-friendly (HTML vs JSON, say). As in my previous comment in this series of posts, the obvious go-to example is search engines; as well, there are intermediaries of all sorts, particularly caching proxies, all of which can sensibly do something smart about e.g. the distinction between 4xx and 5xx.

A client requests a non-directory page but puts a '/' on the end of the URL (eg '/page/' instead of '/page'). Is this 400, 404, ignored (with the plain page being served), produces a redirection to the slashless version, or something else?

404. URLs aren’t filesystem paths.

But if you want to treat them that way anyway, you should really send a redirect, otherwise browsers will resolve relative URLs the wrong way. You’d also be serving the same content from two different URLs, with all the consequences that entails, like search engines getting annoyed, people inadvertently bookmarking the same page twice, etc. – none of which are fatal, but all of which will be irritating to someone or other in some use case or other.

The server validates pages in some way before serving them to clients and a requested page exists but fails to validate. Is this 500, 503, 404, or something else?

Does the outcome of that validation depend in any way on the request? It sounds like you’re asking about the case where it doesn’t, in which case, 500. 503 means “this error was due to a hiccup, please try again in a little while” so it is totally inapplicable here.

(Note that 404 is basically the generic error response and is documented as such in the HTTP/1.1 RFC.)

A client that does something specific with 404 will eventually take it to mean there is nothing there under that URL. E.g. a search engine will flag the cached content for removal after some time; a browser user clicking an old bookmark might summarily delete it. Is that what you want to achieve? If not, then it’s more useful to tell the client that the problem with the request is something on the server end, and to hold on to whatever memory it has of that page in the meantime.

A client does a DELETE for a URL that doesn't exist and your web app doesn't support DELETE; in fact it only does plain old HEAD, GET, and maybe POST. Bearing in mind that this is extremely unlikely to be just an innocent mistake, is this 405, 501, 404, 403, or something else?

There is no resource at the URL in the request, so 405 would be sort of perverse to respond with. I’d argue against 403 on similar grounds but couldn’t really object to it. Between 404 and 501 it’s a toss-up though.

Aristotle Pagaltzis

From 87.79.78.105 at 2013-08-04 18:49:53:

In general, the question to ask is not, “which status code is the correct one to use based on how closely its definition matches the nature of the situation?” but rather, “what effect will the use of any particular status code have on other software whose implementation choices are guided by the definition of that status code?” (Also “what is the most useful thing to emit in terms of the web infrastructure ecosystem?” E.g. when it comes to the irritations caused by serving identical content under several URLs.) The goal is interop, not some decontextualized platonic ideal of webapp-ness.

Aristotle Pagaltzis

Written on 03 August 2013.
« I'm giving up on a custom laptop environment for Fedora 19
The paucity of generally useful HTTP error codes »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Aug 3 02:28:37 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.