Why I don't like HTTP as a frontend to backend transport mechanism
An extremely common deployment pattern in modern web applications is not to have the Internet talk HTTP directly to your application but instead to put it behind a (relatively lightweight) frontend server like Apache, nginx, or lighttpd. While this approach has a number of advantages, it does leave you with the question of how you're going to transport requests and replies between your frontend web server and your actual backend web application. While there are a number of options, one popular answer is to just use HTTP; effectively you're using a reverse proxy.
(This is the method of, for example, Ruby's Unicorn and clones of it in other languages such as Gunicorn.)
As it happens, I don't like using HTTP for a transport this way; I would much rather use something like SCGI or FastCGI or basically any protocol that is not HTTP. My core problem with using HTTP as a transport protocol can be summarized by saying that standard HTTP does not have transparent encapsulation. Sadly that is jargon, so I'd better explain this better.
An incoming HTTP request from the outside world comes with a bunch
of information and details; it has a source IP, a Host:
, the URL
it was requesting, and so on. Often some or many of these are
interesting to your backend. However, many of them are going to be
basically overwritten when your frontend turns around and makes its
own HTTP request to your backend, because the new HTTP request has
to use them itself. The source IP will be that of the frontend, the
URL may well be translated by the frontend, the Host:
may be
forced to the name of your backend, and so on. The problem is that
standard HTTP doesn't define a way to take the entire HTTP request,
wrap it up intact and unaltered, and forward it off to some place
for you to unwrap. Instead things are and have to be reused and
overwritten in the frontend to backend HTTP request, so what your
backend sees is a mixture of the original request plus whatever
changes the frontend had to make in order to make a proper HTTP
request to you.
You can hack around this; for example, your frontend can add special headers that contain copies of the information it has to overwrite and the backend can know to fish the information out of these headers and pretend that the request had them all the time. But this is an extra thing on top of HTTP, not a standard part of it, and there are all sorts of possibilities for incomplete and leaky abstractions here.
A separate transport protocol avoids all of this by completely separating the client's HTTP request to the frontend from the way it's transported to the backend. There's no choice but to completely encapsulate the HTTP request (and the reply) somehow and this enforces a strong separation between HTTP request information and transport information. In any competently designed protocol you can't possibly confuse one for the other.
Of course you could do the same thing with HTTP by defining an HTTP-in-HTTP encapsulation protocol. But as far as I know there is no official or generally supported protocol for this, so you won't find standard servers or clients for such a thing the way you can for SCGI, FastCGI, and so on.
(I feel that there are other pragmatic benefits of non-HTTP transport protocols, but I'm going to defer them to another entry.)
Sidebar: another confusion that HTTP as a transport causes
So far I've talked about HTTP requests, but there's an issue with HTTP replies as well because they aren't encapsulated either. In a backend server you have two sorts of errors, errors for the client (which should be passed through to them) and errors for the frontend server that tells it, for example, that something has gone terribly wrong in the backend. Because replies are not encapsulated you have no really good way of telling these apart. Is a 404 error a report from the web application to the client or an indication that your frontend is trying to talk to a missing or misconfigured endpoint on the backend server?
|
|