Thinking about the security issues in HTTP 403 versus 404 errors

July 19, 2013

In the Unix world, the classical view of error responses for login attempts is that you should behave exactly the same for invalid logins as for valid logins; you should prompt for a password, you should spend just as long 'verifying' the password, and then you should tell whatever it is making the login attempt simply 'you didn't succeed'. To do otherwise is to leak information to an attacker about what logins are actually valid, which is potentially useful in all sorts of contexts.

Assuming that HTTP error codes actually matter and that people you do not like will accurately interpret small differences in your error responses, there is an equivalent issue in the HTTP world; HTTP explicitly has a difference between 'thing does not exist' (404) and 'you do not have access to thing' (403). Under some circumstances this can leak information to attackers.

Part of what makes HTTP tricky here is the question of what information you want to protect. Put simply, is it more important to hide the existence or non-existence of URLs or to hide the existence of (some) restricted access URLs? This has no general answer; what you choose depends on your local circumstances and security needs.

Now, it's worth mentioning an important difference between HTTP requests and login attempts: in HTTP requests on normal sites, it's routine for people to innocently make requests for bad URLs. Bad URLs come from all over, including you making them bad by reorganizing your site and breaking all of those links. This is unlike logins, where bad logins are frequently from attackers.

(There are situations where the URLs are strictly internal constructs inside your complex web app and no human should ever save such an URL or pass it around. In these cases bad URLs are much more like bad logins.)

There are of course some pragmatic considerations. First, it's generally safer to check authentication very early rather than run substantial parts of your code at the behest of an unauthenticated person; in the jargon, you're said to have less exposed attack surface. In web apps this often means that you have no idea whether not a URL is valid when you reject the request. As a result you probably can't hide the presence of authentication from a sophisticated attacker but you often do effectively wind up hiding what URLs under the authentication point actually exist.

Second, if you want to deal with thie issue by using only one generic error code for 'you can't have this URL for whatever reason' I think you probably want to use a 404 unless your entire site is protected by authentication (in which case 403 is better). You will get innocent bad URLs for things that aren't protected by authentication and returning anything other than a 'page not found' report of some sort will just confuse the poor user.

(Technically you could return a 403 status but with HTML text that says that the page is not accessible for some reason. Most or all actual people will read the text and never know the status code.)

(What started me thinking about this whole issue was a Tweet from Sean M Puckett.)


Comments on this page:

From 87.79.78.105 at 2013-07-19 22:28:53:

I am assuming you are aware of this, but since it’s relevant and you did not mention it, I’ll make a note of it still: the definition of status code 403 in RFC 2616 expressly states that “If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.” So this need to conceal information from attackers is explicitly acknowledged – and the choice not to, sanctioned – by the HTTP spec.

Aristotle Pagaltzis

Written on 19 July 2013.
« Fedora 19 and the search for volume management
A bit on the performance of lexers in Python »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jul 19 00:49:35 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.