2013-08-03
The paucity of generally useful HTTP error codes
One of the things that I didn't appreciate until I really looked at
HTTP error codes is how few generally useful
ones there are. To start with, we can divide HTTP error codes into two
categories: specific technical failings and general errors. Specific
technical failings are things like an Accept:
header that the server
can't satisfy. There's a bunch of 4xx errors for these cases (and a
few 5xx errors), but they aren't useful in general since you're only
supposed to generate them in specific technical circumstances.
Once you get into the officially specified general errors, though, there simply aren't that many: 403 Forbidden, 404 Not Found, 410 Gone, 429 Too Many Requests, and maybe 451 Unavailable For Legal Reasons (if you accept Internet drafts) and 400 Bad Request (if you stretch it). On the server error side, 500 Internal Error and 501 Not Implemented are basically it. Of the 4xx errors, only 403 and 404 are really general.
(It's striking how many unofficial HTTP error codes there are in the Wikipedia list. Apparently a lot of people have found the current set inadequate.)
This limited set of error responses means that a web application can't really tell clients very much about what went wrong using error codes alone (at least officially assigned ones). Consider, for instance, a web application that both has access-restricted content and blocks certain clients from some or all content. HTTP error codes alone provide no real way to distinguish between 'you can't have this content because you aren't properly authenticated' and 'you can't have this content because I think you're a robot and robots shouldn't be asking for this' (especially if the web app also has rate limiting and so uses 429).
This has wound up tied into my feeling that specific HTTP errors may not matter that much. If the available HTTP error codes are too limited to really communicate what you mean to the client, your choice of what specific error code you use from the limited general-use set is not necessarily very important.
Sidebar: technical failings versus general errors
I've realized that I draw a big, personal distinction between these two that doesn't necessarily exist. I consider technical failings to be the job of the web server (and the framework if any) to worry about and I basically ignore them when writing an application. The errors I care about are general errors.
Thus I need to clarify and effectively walk back some stuff I said when I asked whether specific HTTP error codes mattered. Getting the error code right for specific technical failings does matter (at least in theory). I was intending to focus on general, application level errors but both didn't make that clear and didn't appreciate just how many 4xx errors there are for technical failings until I'd looked at a list.
The pragmatic issues around HTTP error codes mattering (or not)
When I posed the question of whether specific HTTP error codes actually mattered I put the question rather abstractly. But it's really a pragmatic question which I can put this way: how much effort is it worth putting into a web application that is used by people to get your HTTP error codes exactly and completely 'right'?
(I'm biased towards the server perspective because that's what I write, but if you write HTTP clients there's a mirror image question of how much sense it makes to write code that takes action based on fine distinctions in the response codes you get.)
I'll use DWiki (the ball of code behind my techblog) as an example. Broadly speaking DWiki can generate what are conceptually errors when people feed it all sorts of bad or mangled requests, when they ask for URLs that do not exist, when there are internal errors in page rendering (such as a bad template), when they are people we don't like, and when they don't have permission to access the URL. Today DWiki responds to almost all of these situations with either a 403 or a 404 status code, and some permission failures don't generate errors at all (instead you get a page with a message about it). Usually (but not always) DWiki generates 404 errors if the problem is something that could plausibly happen innocently and 403 errors otherwise.
(It would be nice to generate 403 errors for all permission denied situations but DWiki's architecture makes it quite hard for reasons that don't fit in the margins of this entry.)
Could DWiki do 'better', whatever that means? Perhaps. It could use error codes 400, 405, and maybe 505 in some situations, but these are all around the edges. Some uncommon issues should perhaps produce some 5xx error instead of a 404 because they are really a server-side problem with the page.
(DWiki also punts completely on a lot of semi-impossible situations. For example it assumes that either all clients can handle any format it wants to return or that the real web server will worry about checking for this and generating 406 errors when appropriate.)
I could go through every error generation in DWiki along with the list of HTTP status codes and try to match up each error with the closest match (which is often not very close; there are very few HTTP codes for relatively generic errors). But it very much appears to me that this would be both a lot of work for very little gain and also quite subjective (in that what I think is the right error code for a situation might not match up with what someone else expects).
Note again that this is for typical web apps and websites that are used by people instead of web APIs that are used by programs. Web APIs need to have as much thought put into error codes as any API needs into its error responses. But again, the best way to communicate most errors to API clients may not be through the very limited channel of HTTP error codes.
Sidebar: the subjectivity of HTTP error codes
Here are three examples of what I mean by subjectivity in HTTP error code choice.
- A client requests a non-directory page but puts a '/' on the end
of the URL (eg '/page/' instead of '/page'). Is this 400, 404,
ignored (with the plain page being served), produces a redirection
to the slashless version, or something else?
- The server validates pages in some way before serving them to
clients and a requested page exists but fails to validate. Is
this 500, 503, 404, or something else?
(Note that 404 is basically the generic error response and is documented as such in the HTTP/1.1 RFC.)
- A client does a
DELETE
for a URL that doesn't exist and your web app doesn't supportDELETE
; in fact it only does plain oldHEAD
,GET
, and maybePOST
. Bearing in mind that this is extremely unlikely to be just an innocent mistake, is this 405, 501, 404, 403, or something else?
If you poll a bunch of web developers I think you will get a bunch of different answers for all of these.