2015-06-12
My pragmatic view of HTTPS versus caching
One of the criticisms of going all HTTPS on the web is that it pretty much destroys caching. As Aristotle Pagaltzis commented on my entry, caching somewhat obscures traffic flow by itself (depending on where the cache is and who is watching), and as other people have commented in various places (cf), caching can serve valuable bandwidth reduction purposes. I and other people advocating an all HTTPS world should not ignore or understate this. On the contrary we should be explicit and admit that by advocating all HTTPS we are throwing some number of people who use caching under the bus.
The problem is that there is no good choice; regardless of what we choose here someone is getting thrown under the bus. If we go HTTPS and lose caching, we throw cache users under the bus. But everything that stays HTTP throws a significant number of other people under the bus of ISP traffic inspection, interception, tampering, and general monetization through various means. Our only choice is who gets thrown under in what circumstances, and what the effects of getting run over by the bus are. We cannot in any way pretend that there are no downsides of staying with HTTP, because there clearly are and they are happening today.
The effects of losing caching are mostly that for some people web browsing gets slower and perhaps more expensive due to bandwidth charges. The effects of losing privacy and content integrity are that for lots of people, well, they lose privacy, have their activities tracked quite intrusively, have advertising shoved down their throat and sometimes have their browsing weaponized and so on.
Faced with this tradeoff, I pick throwing people using caching under the bus of slower access. Sorry, cache users, I regret that you're going to have this happen to you (at least until people develop some more sophisticated HTTPS-capable caches and systems), but as far as I'm concerned it's clearly the lesser of two evils (as seen from my position, which is biased in some ways).
(I will not go so far as saying that cache users who insist that everyone else continue to have traffic intercepted, monitored, and monetized in order for the cache users to have an easier time are being selfish, partly because of the cost issues. But sometimes I do sort of feel that way.)
2015-06-11
HTTP should be dropped even as a pure Internet transport mechanism
In a comment on my pragmatic view on switching to HTTPS, Aristotle Pagaltzis wrote in part (in two bits I'm replying to separately):
HTTPS basically disables caching. And caching obscures traffic flow by terminating it locally, dispersing and diffusing it.
Plain HTTP caching today has two essentially fatal limits for most people, which is that your 'last mile' ISP can both snoop your traffic and alter it to do things like insert malicious content. Yes, I consider 'ride along' JavaScript ads to be malicious content. The reality is that on today's Internet, your ISP is a threat.
(Your last mile ISP may be doing this on its own behalf, it may be doing it under orders from someone, or it may have had its network quietly altered to do this.)
Since all of these are happening today, it is my view that plain HTTP caching is not worth keeping. It is possible to use it for good, but in practice it and the closely related issue of forced HTTP proxying are too often either used for evil or exploited for evil.
Then:
Now you must reveal to someone that you are interested in certain public content. But if you can verify the integrity of that content you could have a choice of intermediaries to fetch it from, depending on whom you want to reveal your interest to, and whom you want to conceal it from. And no one stops you from using TLS to them in order to shut out intrepid eavesdroppers, of course.
This is clearly a new protocol that carries integrity information along with it and simply uses HTTP as a transport. It's my view that you must use TLS even with this, for what are ultimately pragmatic reasons. If the transport protocol uses unencrypted links such as HTTP, there are two issues.
First, the transport protocol leaks information about what you're reading to your 'last mile' ISP (and possibly others). We already know that ISPs will monitor and exploit this information if it is available. It doesn't matter if you obscure information about where you fetch the resource from; the mere fact that you are receiving web pages about specific subject matter is a deadly giveaway. Expect this information about your browsing habits and your interests to be sold to the usual suspects.
Second, let's consider the user experience of what happens if the ISP takes advantage of this plain text transfer to actually inject its own content. Of course the resource your web browser has fetched fails integrity checks, so it should not show it to you, right? Well, this is the XHTML problem, or perhaps the HTTPS certificate alert problem. The content is there (as the user sees it) except that the browser is not showing it to the user for essentially arbitrary reasons. In my jaundiced view this is not a stable situation and not one that users are going to enjoy. All it needs is one browser to defect to allowing 'show me the content anyway' and then all ISPs are helpfully telling their users to use that option, honest.
Of course an ISP could try to MITM your HTTPS traffic. But we've seen how that show plays out and it doesn't go well for the people doing the interception, primarily but not entirely for social reasons. Without the ability to see and alter cleartext, your ISP is essentially helpless to make alterations; blindly altering the encrypted stream generally creates totally garbled and corrupt results (even ignoring the integrity checks, which will fail).
So my view is that HTTP cannot be used as a transport across the open Internet for anything. At a minimum it can't be used for anything that will wind up in a user browser, and I feel that even with strong integrity checks it leaks too much information to your ISP. Put a fork in it, it's done.
(There are plenty of legacy transport uses of HTTP across the open Internet, of course, so in practice it's not going away any time soon as far as tools are concerned. Browsers are another matter.)
2015-06-10
My pragmatic view on switching to HTTPS and TLS everywhere
Roy Fielding is not a fan of using TLS everywhere (via Aristotle Pagaltzis, among others). He argues, as far as I can understand, that TLS mostly provides confidentiality (and a certain amount of integrity), not privacy; if I have it right, his view of 'privacy' seems to be 'privacy even from the site operator'.
My view on all of this is that I'm a pragmatist. Right now there are real, non-theoretical intermediaries between me and a lot of the people who are reading this who are intercepting and logging HTTP traffic. Some of them are adding individually identifiable tracking information that I could take note of if I wanted to, and some are altering the content for various reasons (including the alteration of 'we'll block access to this now that we see what it is'). None of this is theoretical or obscure and increasingly it's not even uncommon. And it's all going to get worse if we let it because all of the intermediaries involved gain value from doing this kind of stuff; they have been restrained so far only by some combination of lingering legal concerns and technical (or budget) limits.
So I strongly disagree with Fielding when he says:
TLS is NOT desirable for access to public information, except in that it provides an ephemeral form of message integrity that is a weak replacement for content integrity.
First off, that 'except' is a really important thing, as we've seen. By itself I feel that preventing third parties from tampering with web-fetched resources in flight is now a vital concern, since third parties are actually doing it now. But I also disagree about the general issue of access to public information.
Libraries are full of public information, pretty much by definition. Yet librarians zealously guard (and block) access to information about who has checked out what, because they understand that revealing that information can be damaging. What public information you access says a lot about you and your concerns. To stretch the analogy even further, it's useful for librarians to protect your borrowing records even though a sufficiently dedicated third party could deduce much of the information given enough work.
Are HTTPS and TLS perfect? Of course not. Do they still betray some information about your requests? Of course. But they are still the best tool we have at hand to deal with the serious problems that we are having right now. HTTPS everywhere will unquestionably cramp the style of a bunch of people who are up to no good, which beats letting them continue on undisturbed.
(In security, as in much else, the perfect is the enemy of the good.)
(It's also my outsider's opinion that the IETF is probably the wrong place to come up with new cryptography and privacy standards. I suspect that in practice the IETF is better served by recommending and using existing practices such as TLS. Partly because this is because TLS already exists in widely available form, making it easy to adopt and use.)
2015-06-03
What makes for a simple web application environment
Suppose that you want to create a simple web application environment, something that can compete with CGI programs and PHP to attract people who have modest needs and just want to throw something programmable up on their website. What does it need to have? As it happens I have some views on this, formed from my long run of using and abusing CGI programs (although I haven't done PHP).
In my opinion, anything that wants to displace CGI or PHP in people's affections needs to be as simple to get going as they are (see eg the attractions of CGI scripts). This means:
- A simple deployment scheme that is as close to 'copy a single
thing and go' as possible. If one step of deploying a new web app
in your system is 'edit the web server configuration file', you've
already lost.
I don't think this requires an in-server environment, the way PHP
is in Apache, but I think it does require any service daemons get
auto-started and auto-reloaded and you need some simple way of
hooking it up to the main server.
These days I would hold my nose and say that the right answer for simple multi-file deployments is to let people push .zip files to the server and have the server run things from them in place (and fetch things from them and so on). Python already does this to some degree, so there's an existence proof.
- No persistent state between requests,
at least by default. Not having persistent state means that code
can still work fine even though it's sloppy, and people are going
to write sloppy code in a simple, 'get things done quick' web app
environment.
(I feel somewhat divided about this because it immediately hammers a language I rather like. Maybe you can set up your programming environment to strongly discourage global state and have that be good enough, but I'm kind of dubious. Or maybe Go will turn out to be a special case where goroutines are good enough isolation.)
- A simple and easy programming model for handling web requests in
general. My view is that inspecting an environment and printing
things out is quite simple and easy to deal with and thus the
more you depart from this, the more difficult your environment
is. A 'hello world' web app equivalent should not be very many
more lines of code than a plain command line one.
(Similarly, getting the
POSTparameters should be either dirt simple or essentially automatic.)One important aspect of this is that the programming model should look exactly as if you're handling the request in the main web server. If the main web server forwards requests to the simple web app environment with HTTP, you must hide the seams involved here (cf). This basically means running your own custom protocol on top of HTTP to forward all of the information that HTTP will overwrite, then restoring it before people's code sees it.
The last point leads me to the view that URL routing should not be part of the basic layer of your simple web app environment to any particularly visible extent, since URL routing can get quite complex very fast. Ideally URL routing is part of the deployment process, not the coding process, in the same way that copying a CGI script or a PHP page to a particular place in the directory hierarchy does the 'URL routing' for them.
A simple web app environment can optionally provide more sophisticated features if they don't get in the way, and there's certainly a lot of long term benefit to doing so. But if you're aiming for a simple environment, start with the 'hello world' case and make sure it stays simple.
(Whether creating such a new simple web app environment is possible any more is an open question. It clearly requires significant integration with the main web server, and whether you can do that any more is an open question. CGI and PHP are both kind of relatively unique historical artifacts in Apache, after all. Maybe we're just stuck with no good general simple web app environments to complete with them.)