Why I don't like HTTP as a frontend to backend transport mechanism

August 26, 2014

An extremely common deployment pattern in modern web applications is not to have the Internet talk HTTP directly to your application but instead to put it behind a (relatively lightweight) frontend server like Apache, nginx, or lighttpd. While this approach has a number of advantages, it does leave you with the question of how you're going to transport requests and replies between your frontend web server and your actual backend web application. While there are a number of options, one popular answer is to just use HTTP; effectively you're using a reverse proxy.

(This is the method of, for example, Ruby's Unicorn and clones of it in other languages such as Gunicorn.)

As it happens, I don't like using HTTP for a transport this way; I would much rather use something like SCGI or FastCGI or basically any protocol that is not HTTP. My core problem with using HTTP as a transport protocol can be summarized by saying that standard HTTP does not have transparent encapsulation. Sadly that is jargon, so I'd better explain this better.

An incoming HTTP request from the outside world comes with a bunch of information and details; it has a source IP, a Host:, the URL it was requesting, and so on. Often some or many of these are interesting to your backend. However, many of them are going to be basically overwritten when your frontend turns around and makes its own HTTP request to your backend, because the new HTTP request has to use them itself. The source IP will be that of the frontend, the URL may well be translated by the frontend, the Host: may be forced to the name of your backend, and so on. The problem is that standard HTTP doesn't define a way to take the entire HTTP request, wrap it up intact and unaltered, and forward it off to some place for you to unwrap. Instead things are and have to be reused and overwritten in the frontend to backend HTTP request, so what your backend sees is a mixture of the original request plus whatever changes the frontend had to make in order to make a proper HTTP request to you.

You can hack around this; for example, your frontend can add special headers that contain copies of the information it has to overwrite and the backend can know to fish the information out of these headers and pretend that the request had them all the time. But this is an extra thing on top of HTTP, not a standard part of it, and there are all sorts of possibilities for incomplete and leaky abstractions here.

A separate transport protocol avoids all of this by completely separating the client's HTTP request to the frontend from the way it's transported to the backend. There's no choice but to completely encapsulate the HTTP request (and the reply) somehow and this enforces a strong separation between HTTP request information and transport information. In any competently designed protocol you can't possibly confuse one for the other.

Of course you could do the same thing with HTTP by defining an HTTP-in-HTTP encapsulation protocol. But as far as I know there is no official or generally supported protocol for this, so you won't find standard servers or clients for such a thing the way you can for SCGI, FastCGI, and so on.

(I feel that there are other pragmatic benefits of non-HTTP transport protocols, but I'm going to defer them to another entry.)

Sidebar: another confusion that HTTP as a transport causes

So far I've talked about HTTP requests, but there's an issue with HTTP replies as well because they aren't encapsulated either. In a backend server you have two sorts of errors, errors for the client (which should be passed through to them) and errors for the frontend server that tells it, for example, that something has gone terribly wrong in the backend. Because replies are not encapsulated you have no really good way of telling these apart. Is a 404 error a report from the web application to the client or an indication that your frontend is trying to talk to a missing or misconfigured endpoint on the backend server?


Comments on this page:

By Anonymous at 2014-09-02 11:34:19:

It also makes sense from a security perspective. SCGI is really simple and even an assembly language implementation would be auditable. FastCGI is more complex, but also auditable. The HTTP/1.1 RFCs on the other hand are over 500 pages long and HTTP's grammar is most likely longer than the source code of an SCGI implementation. And let's not even talk about incompatibilities between the often simplistic "HTTP backend implementations", the supported HTTP subsets of the various implementations and their interpretations of the HTTP specification. It hard enough the write a secure and compatible HTTP server.

I don't understand how you come up with the idea to transport of few bytes of key value pairs and a byte stream over HTTP. It seems that these days HTTP is the new TCP and if your software doesn't use HTTP it is considered to be legacy software. And let's not talk about "if it's not HTTP, it won't pass the firewall" types of arguments.

Another advantage of SCGI and FastCGI is that with many HTTP servers both can be used over Unix sockets and thus can be cheaply protected by the operating system's access control mechanisms (Unix permissions, AppArmor, SELinux etc.) without the complexity and overhead of TLS (technically HTTP can also be used over Unix sockets, see e.g. nginx, but it is rarely supported).

By James (trs80) at 2017-04-25 11:45:28:

There was, amazingly, at the time of this post, RFC 7239, which standardises the Forwarded: header to pass on the source address, Host: header etc. to assist in this case (altthough perhaps more the case of load balancing/reverse proxying). Of course I only heard about it today, nearly three years later.

Written on 26 August 2014.
« 10G Ethernet is a sea change for my assumptions
The difference between Linux and FreeBSD boosters for me »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Aug 26 00:13:42 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.