How software makes reverse proxying hard

July 3, 2009

Our user run webservers rely on the ability to run various web applications that people want to use behind a reverse proxy. Well, the theoretical ability, because it turns out that there are a couple of things that programs do to make reverse proxying hard (and that they could do differently to make things easier).

First is that they should be willing to use the HTTP proxy headers added by Apache to get certain bits of information about the request, most notably the IP origin address. For obvious reasons, they should do this only when specifically configured to do so.

(Possibly there is an Apache setting for lying to CGIs and other applications about this sort of stuff, but if so we haven't stumbled across it.)

The less obvious thing is that applications need to distinguish between what I will call 'input' URLs, or at least a URL prefix, and 'output' URLs. Input URLs are what you see on requests after they have been remapped by the proxying process; output URLs are the external, pre-proxying, public URLs that should appear in your output (in HTML, in redirects, in Atom feeds, etc).

Applications with no such distinction are, unfortunately, very common. We've tried a couple of ways to hack around it:

  • Apache's ProxyPassReverse directive is a very, very limited attempt to patch up this problem. In my opinion, it actually does more harm than good in most situations, since it papers over only part of the problem; better to have no papering over at all, so that everything breaks immediately.

  • one can often make the absolute path on the user-run web server the same as it is on the real web server; this leaves you with just the port being different. If you're willing to do some hacking, you can configure Apache to lie about that too.

(This works even when the absolute path has a '/~user/' component. If you disable UserDirs, Apache is perfectly happy to have a literal ~user/ directory in your document root and to serve things from it.)

I'm honestly surprised that more web applications don't make it easy to use them behind a reverse proxy; I had the impression that various forms of reverse proxies were relatively common in high load environments. Maybe they're deliberately set up to be more transparent than ours is, to look more like load balancers than actual reverse proxies.


Comments on this page:

From 195.188.152.14 at 2009-07-17 05:13:28:

Hacking in support for using the appropriate headers [for origin IP addresses] when configured isn't generally that tricky, and is another option (for source-provided apps).

I have the same problem with some apps that I work on, and hacking the app and possibly sending patches in (for open-source apps) is about the best that can be done in a lot of cases.

From 195.188.152.14 at 2009-07-17 05:14:26:

For a touch of irony, the IP address shown on comments posted on here is that given by my ISP web proxy, not my actual IP :)

Written on 03 July 2009.
« Finding out when a command in a pipeline fails
A side note on the cost of operations »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jul 3 01:06:46 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.