Do specific HTTP error codes actually matter?

July 17, 2013

There are a few HTTP 4xx and 5xx responses that provoke specific browser reactions; the classic case is a 401 response to provoke an authentication challenge for HTTP level authentication. But apart from those specific responses, does it actually matter what error code you return for a failed or rejected HTTP request?

If it's a human making the request, the HTML of the error page they see does matter; it's what they'll read to understand what went wrong. But that error text is unrelated to the HTTP status code and in fact I'm not certain that a lot of people even notice or know what HTTP status is attached to a 'page not found' page. As far as they're concerned you could probably return a 2xx series response and the error text and it would all be the same to them.

(I hope that browsers behave a little differently for 2xx versus 4xx pages; for example, I sort of hope that browsers don't enter the latter into your browsing history as a visited link. But maybe they do.)

If it's code making the request, it's clear that returning some 4xx or 5xx error for failed requests is important so that the code can easily tell them from successful ones. But I'm not at all convinced that there's very much (general) code that cares about what the specific error return is from some random website that it is poking, as opposed to the fact that there was an error.

I have two reasons to be dubious. The first is that treating different error codes differently in your code takes additional work and also requires having something different that you can do. If a failure is a failure, well, it doesn't matter just why it failed. The second is that I doubt web sites are any more careful about which HTTP error code they generate than code is about handling them. In an environment where any old HTTP error is just as good as another one, client code can't trust a random server's error code to be meaningful in the first place.

(Error codes can be meaningful in more confined situations, where you have a specific client talking to a specific server and you know both ends.)

I certainly think it would be nice if, say, Google's web spider behaved somewhat differently in response to '410 Gone Permanently' than eg '504 Gateway Timeout'. But I'm not certain it makes sense for even Google to bother. People can misconfigure servers to generate 410s when they don't actually mean it and in the general case what matters is whether the error really is permanent instead of whether it just claims to be.

(This was partly brought to mind by the comment on my entry about banning a feed reader. By the way, that feed reader is still banned, still getting 403s, and still trying every ten minutes. That's a clear case of not paying attention to errors at all.)


Comments on this page:

From 71.80.128.33 at 2013-07-17 14:35:30:

You can argue that there's more than we need, but they're quite useful, even for humans.

40x means the file is gone. 50x means the server won't allow me to see the file.

It makes a lot of difference to me, and if it's MY web server that I have to fix, I want to know which it is before I waste my time trying to resolve the wrong one.

From 87.79.78.105 at 2013-07-17 15:07:46:

There certainly is a difference between the handling a client can apply to 4xx vs 5xx responses. The upshot of 5xx responses is “try again later” whereas 4xx means some form of “that was silly, don’t do that again”, variously because the request was ill-formed or lacking credentials or addressed to the void or at something impermissible or what have you.

Browsers cache things differently based on status code. Crawlers do respond differently to different errors as well. And HTTP is more than clients and servers, there are intermediaries too. Caching intermediaries are especially dependent on good status codes to function optimally.

The biggest distinction on the webapp end is 200 vs everything else, of course. But once you have to pay that much attention to the status code, it is essentially no further effort to also pick a proper one.

And different use cases at the client end have different tolerances for how dependent they are on good status codes. E.g. desktop aggregators can get by on essentially brain-dead treatment of HTTP, which even browsers cannot afford (but a good aggregator will invest quite a bit of smarts in that area). So clients can adjust the level of effort they want to expend on status code handling… but they cannot do better than what the server hands them. So servers should make the effort.

You do have a point that clients shouldn’t trust the server blindly on things like 410 of course. I’m sure Google’s crawlers likewise don’t take status codes on individual responses for gospel either. But if you request a resource repeatedly, or crawl many “nearby” resources in quick succession, you can take a (quasi-)stochastic approach, and you should. And AFAIK Google’s crawlers do.

Bottom line is, if you’re writing something that generates HTTP responses, you should put some thought into providing good status codes, because there are pieces of machinery that will profit, but they can’t if you don’t. If you’re writing something that consumes HTTP responses, the amount of effort you put into handling status codes depends on requirements and can well be zero.

Aristotle Pagaltzis

From 91.213.91.28 at 2013-07-23 08:23:56:

Your thoughts are starting at the wrong point. The question is not only how they are used by a client or not but if they represent the different situation HTTP can encounter.

Yes, first point: error codes are about HTTP. There might be a like between the application using it but it's mainly about access to the resources (which are unavailable, somewhere else or responded). The response page to a user has nearly nothing to do with the response code (like 200, 206, 302, 301, 304,...). Printing a big bold 404 has no meaning but the page must use the correct response code.

Second point: Saying that no difference in error handling justify disappearance of error granularity is stupid. Think about the logs. If you lose the error nature then the debugging or the maintenance can be seriously compromised.

Some code can be very idiot, of course. But It's not because some don't read the manual you should only have the message 'An error occurred (and deal with it by your own, we have information on what went wrong but you don't deserve is)'.

PS: The only response code I don't know of is 418 and there's voices to have a new code 451. PPS: And how can I register?!

Written on 17 July 2013.
« Systemd needs sensible, non-truncated output
Fedora 19 and the search for volume management »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jul 17 01:12:04 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.