2014-09-04
Some other benefits of using non-HTTP frontend to backend transports
A commentator left a very good comment on my entry on why I don't like HTTP as a frontend to backend transport that points out the security benefits of using a simple protocol instead of a complex one. That makes a good start to talking about the general benefits of using a non-HTTP transport, beyond the basic lack of encapsulation I talked about in my entry.
The straightforward security benefit (as noted by the commentator) is that a simple protocol exposes less attack surface and will lead to implementations that are easier to audit. Very related to this is that full HTTP is a very complicated protocol with many dark corners, especially if people start trying tricky exploits against you. In practice, HTTP used as a transport is unlikely to use full HTTP (and hopefully many of the perverse portions will be sanitized by the frontend); however, just what subset of HTTP it uses is going to be some unpredictable, generally undocumented, and variable (between frontends). As a result, if you're implementing HTTP for a backend you have a problem; put simply, where do you stop? You probably don't want to try to implement the warts and all version of full HTTP, if only because some amount of the code you're writing will never get used in practice, but you don't necessarily know where it's safe to stop.
(Related to that is how do you test your implementation, especially its handling of errors and wacky requests? In a sense, real HTTP servers have a simple life here; you can simply expose them on the Internet and see what crazy things clients send you and expect to work.)
A transport protocol, even a relatively complicated one like FastCGI, gives you a definite answer to this. The protocol is much simpler than HTTP and much more bounded; you know what you have to write and what you have to support.
(There is a devil's advocate take on this that I will get to in a sidebar.)
Another pragmatic advantage is that using a separate transport protocol imposes a strong separation between the HTTP URL, host, port, and so on of the original request and the possible TCP port and endpoint of the transport protocol. Your backend software has to work very hard to confuse the two and thus to generate URLs that use the backend information instead of the real frontend request information. By contrast software behind a HTTP reverse proxy has to take very special care to use the right host, URL, and so on; in many configurations it needs to be specifically configured with the appropriate frontend URL information instead of being able to pull it from the request. This is a perennial issue with software.
Sidebar: the devil's advocate view of full HTTP complexity
Using a non-HTTP transport protocol doesn't necessarily mean that you avoid all of the complexity of HTTP, because after all your application is still dealing with HTTP requests. What your backend gets to avoid is some amount of parsing the HTTP request and sanitizing it; with some frontend servers you can also avoid handling things like compressing your response in a way that the client can deal with. Even under the best of circumstances this still leaves your backend (generally your application and framework) to handle the contents of HTTP headers and the HTTP request itself. This is where a great deal of the complexity of HTTP resides and it's relatively irreducible because the headers contain application level information.
You are also at the mercy of your frontend for how much sanitization is done for you, and this may not be well documented. Is there a header size limit? Does it turn multi-line headers (if a client is perverse enough to send them) into single-line entities or does it leave them with embedded newlines? And so on.
(Let's not even ask about, say, chunked input.)
If you're dealing with the contents of HTTP headers and so on anyways, I think that you can rationally ask if not having to parse the HTTP request is such a great advantage.