2013-07-25
An important thing about security issues in HTTP error responses
Although I mentioned this in passing in my first entry on this subject, it is worth talking about how Unix login attempts differ from HTTP requests for most websites. The obvious difference is that innocent people make requests for bad URLs much more frequently than they use bad logins on Unix machines. The second difference can be put in two ways, either that people make plenty of unauthenticated requests or that (unlike in Unix) it is not necessarily obvious when you need to authenticate to do something unless you're told explicitly.
Unless you're running an unusual website, the overall goal of your site is to be useful to people. Being secure is part of being useful but not the only part (otherwise you wouldn't have a website at all). The most secure approach is clearly to pick one error code and one generic set of error text and present that to users regardless of what's wrong. But this is not really useful to actual people who are making innocent mistakes. By extension, more useful error responses are for those actual people (and sometimes actual software) and the goal is thus to pick responses that are useful to people while leaving you as secure as possible.
There is clearly a balance here and where you put it depends on a lot of factors. As with all security balance situations there is no single global right answer. If you're running a mostly public site with low-sensitivity information that has unpredictable (by users) areas of access restrictions, you're likely to tilt strongly towards giving site visitors a lot of information in error responses. Since this describes a lot of websites it's perhaps no surprise that HTTP itself is biased in this direction (as show by even having separate error codes for this).
(Note that error codes are only one part of the information you give people in HTTP error responses. Don't assume that an attacker is, eg, ignoring the actual text of those responses.)
2013-07-19
Thinking about the security issues in HTTP 403 versus 404 errors
In the Unix world, the classical view of error responses for login attempts is that you should behave exactly the same for invalid logins as for valid logins; you should prompt for a password, you should spend just as long 'verifying' the password, and then you should tell whatever it is making the login attempt simply 'you didn't succeed'. To do otherwise is to leak information to an attacker about what logins are actually valid, which is potentially useful in all sorts of contexts.
Assuming that HTTP error codes actually matter and that people you do not like will accurately interpret small differences in your error responses, there is an equivalent issue in the HTTP world; HTTP explicitly has a difference between 'thing does not exist' (404) and 'you do not have access to thing' (403). Under some circumstances this can leak information to attackers.
Part of what makes HTTP tricky here is the question of what information you want to protect. Put simply, is it more important to hide the existence or non-existence of URLs or to hide the existence of (some) restricted access URLs? This has no general answer; what you choose depends on your local circumstances and security needs.
Now, it's worth mentioning an important difference between HTTP requests and login attempts: in HTTP requests on normal sites, it's routine for people to innocently make requests for bad URLs. Bad URLs come from all over, including you making them bad by reorganizing your site and breaking all of those links. This is unlike logins, where bad logins are frequently from attackers.
(There are situations where the URLs are strictly internal constructs inside your complex web app and no human should ever save such an URL or pass it around. In these cases bad URLs are much more like bad logins.)
There are of course some pragmatic considerations. First, it's generally safer to check authentication very early rather than run substantial parts of your code at the behest of an unauthenticated person; in the jargon, you're said to have less exposed attack surface. In web apps this often means that you have no idea whether not a URL is valid when you reject the request. As a result you probably can't hide the presence of authentication from a sophisticated attacker but you often do effectively wind up hiding what URLs under the authentication point actually exist.
Second, if you want to deal with thie issue by using only one generic error code for 'you can't have this URL for whatever reason' I think you probably want to use a 404 unless your entire site is protected by authentication (in which case 403 is better). You will get innocent bad URLs for things that aren't protected by authentication and returning anything other than a 'page not found' report of some sort will just confuse the poor user.
(Technically you could return a 403 status but with HTML text that says that the page is not accessible for some reason. Most or all actual people will read the text and never know the status code.)
(What started me thinking about this whole issue was a Tweet from Sean M Puckett.)
2013-07-17
Do specific HTTP error codes actually matter?
There are a few HTTP 4xx and 5xx responses that provoke specific browser reactions; the classic case is a 401 response to provoke an authentication challenge for HTTP level authentication. But apart from those specific responses, does it actually matter what error code you return for a failed or rejected HTTP request?
If it's a human making the request, the HTML of the error page they see does matter; it's what they'll read to understand what went wrong. But that error text is unrelated to the HTTP status code and in fact I'm not certain that a lot of people even notice or know what HTTP status is attached to a 'page not found' page. As far as they're concerned you could probably return a 2xx series response and the error text and it would all be the same to them.
(I hope that browsers behave a little differently for 2xx versus 4xx pages; for example, I sort of hope that browsers don't enter the latter into your browsing history as a visited link. But maybe they do.)
If it's code making the request, it's clear that returning some 4xx or 5xx error for failed requests is important so that the code can easily tell them from successful ones. But I'm not at all convinced that there's very much (general) code that cares about what the specific error return is from some random website that it is poking, as opposed to the fact that there was an error.
I have two reasons to be dubious. The first is that treating different error codes differently in your code takes additional work and also requires having something different that you can do. If a failure is a failure, well, it doesn't matter just why it failed. The second is that I doubt web sites are any more careful about which HTTP error code they generate than code is about handling them. In an environment where any old HTTP error is just as good as another one, client code can't trust a random server's error code to be meaningful in the first place.
(Error codes can be meaningful in more confined situations, where you have a specific client talking to a specific server and you know both ends.)
I certainly think it would be nice if, say, Google's web spider behaved somewhat differently in response to '410 Gone Permanently' than eg '504 Gateway Timeout'. But I'm not certain it makes sense for even Google to bother. People can misconfigure servers to generate 410s when they don't actually mean it and in the general case what matters is whether the error really is permanent instead of whether it just claims to be.
(This was partly brought to mind by the comment on my entry about banning a feed reader. By the way, that feed reader is still banned, still getting 403s, and still trying every ten minutes. That's a clear case of not paying attention to errors at all.)