Wandering Thoughts archives

2014-08-26

Why I don't like HTTP as a frontend to backend transport mechanism

An extremely common deployment pattern in modern web applications is not to have the Internet talk HTTP directly to your application but instead to put it behind a (relatively lightweight) frontend server like Apache, nginx, or lighttpd. While this approach has a number of advantages, it does leave you with the question of how you're going to transport requests and replies between your frontend web server and your actual backend web application. While there are a number of options, one popular answer is to just use HTTP; effectively you're using a reverse proxy.

(This is the method of, for example, Ruby's Unicorn and clones of it in other languages such as Gunicorn.)

As it happens, I don't like using HTTP for a transport this way; I would much rather use something like SCGI or FastCGI or basically any protocol that is not HTTP. My core problem with using HTTP as a transport protocol can be summarized by saying that standard HTTP does not have transparent encapsulation. Sadly that is jargon, so I'd better explain this better.

An incoming HTTP request from the outside world comes with a bunch of information and details; it has a source IP, a Host:, the URL it was requesting, and so on. Often some or many of these are interesting to your backend. However, many of them are going to be basically overwritten when your frontend turns around and makes its own HTTP request to your backend, because the new HTTP request has to use them itself. The source IP will be that of the frontend, the URL may well be translated by the frontend, the Host: may be forced to the name of your backend, and so on. The problem is that standard HTTP doesn't define a way to take the entire HTTP request, wrap it up intact and unaltered, and forward it off to some place for you to unwrap. Instead things are and have to be reused and overwritten in the frontend to backend HTTP request, so what your backend sees is a mixture of the original request plus whatever changes the frontend had to make in order to make a proper HTTP request to you.

You can hack around this; for example, your frontend can add special headers that contain copies of the information it has to overwrite and the backend can know to fish the information out of these headers and pretend that the request had them all the time. But this is an extra thing on top of HTTP, not a standard part of it, and there are all sorts of possibilities for incomplete and leaky abstractions here.

A separate transport protocol avoids all of this by completely separating the client's HTTP request to the frontend from the way it's transported to the backend. There's no choice but to completely encapsulate the HTTP request (and the reply) somehow and this enforces a strong separation between HTTP request information and transport information. In any competently designed protocol you can't possibly confuse one for the other.

Of course you could do the same thing with HTTP by defining an HTTP-in-HTTP encapsulation protocol. But as far as I know there is no official or generally supported protocol for this, so you won't find standard servers or clients for such a thing the way you can for SCGI, FastCGI, and so on.

(I feel that there are other pragmatic benefits of non-HTTP transport protocols, but I'm going to defer them to another entry.)

Sidebar: another confusion that HTTP as a transport causes

So far I've talked about HTTP requests, but there's an issue with HTTP replies as well because they aren't encapsulated either. In a backend server you have two sorts of errors, errors for the client (which should be passed through to them) and errors for the frontend server that tells it, for example, that something has gone terribly wrong in the backend. Because replies are not encapsulated you have no really good way of telling these apart. Is a 404 error a report from the web application to the client or an indication that your frontend is trying to talk to a missing or misconfigured endpoint on the backend server?

WhyNotHTTPAsTransport written at 00:13:42; Add Comment

2014-08-06

A peculiarity: I'm almost never logged in to websites

Over time, I've come to understand that normal people are almost always logged in to a whole host of big websites out there, even when they're not actually using them. Facebook, Twitter, Flickr, LinkedIn, Google, and so on and so forth, all of these pervasive behemoths that many people sooner or later wind up with accounts on (undoubtedly plus many other sites).

As you may have guessed, I don't do this. I especially don't do this with behemoths that have their tendrils of social buttons stretching across the Internet because I don't trust them. The only sites I'm willing to stay logged in on are ones that I trust a fair bit (and generally also use a decent amount). On top of this, my regular browser is so heavily protected from being infected with cookies that it's often easier to use my testing browser or Incognito Chrome for logins on new sites; as a consequence I wind up effectively 'logging out' when I close the browser and the cookies are discarded.

(I've wound up with a few sites that I use relatively regularly but that I don't trust in my main browser, sometimes because they require way too much JavaScript. So far I've isolated these into separate profiles in my testing browser, profiles that don't flush cookies.)

Among other effects, this sometimes gives me a skewed perspective on using popular sites, especially sites that I'm a member of. For instance, when I look at people's Twitter profiles in my browser I'm sure I'm getting a somewhat different experience than I would be if I was logged in. One of the differences, relevant to a recent issue, is that I cannot accidental betray to a website that member X followed link Y from social media, unassociated email, or whatever.

(Twitter is actually an especially interesting case of logged in versus not logged in because of how blocks work. If someone with an open account blocks you and you're logged in on Twitter, you apparently can't even look at their profile to see their tweets in your browser, although if you log out you can. Of course it goes the other way too; if I'm not logged in I can't see into private accounts that I do have access to through my Twitter accounts. This sometimes matters because looking at tweets in a browser can be the best way to see the thread of a conversation.)

NotLoggedIn written at 22:43:10; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.