2011-08-21
The conflict between caching and tracking on the web
The web user privacy story of the recent past has been the news
about web tracking companies that are using ETag
and Last-Modified
headers to covertly track users. In the process of thinking about
the issue and writing yesterday's entry,
I've come to the probably unsurprising conclusion that there is a
fundamental conflict between browser caching and avoiding tracking.
The attacks on ETag
and Last-Modified
are the tip of an iceberg.
Both of these headers are quite convenient for tracking because the
browser will directly store them and report them back to the server,
which means that you can encode a value into them and then recover it
later. But cache state itself is also stored information, and the very
nature of caching means that the browser has to report the information
back to the server if the cache is going to do any good.
This leads directly to the conflict: the more effective the browser cache is, the easier it is to use the browser cache contents to track you. Conversely, all of the methods of making this tracking harder have the necessary effect of making your browser cache less effective. To make yourself completely untrackable, in theory you need to have no browser cache.
(In practice I think that what you really need to do is inject enough noise into the tracking process that it can't reliably tell people apart. However this rapidly gets into an arms race between the two sides, with the tracking side storing and reading back more and more redundant information in order to defeat noise-injection things like browsers that drop random entries from their cache.)
Thus I'm very doubtful that technical countermeasures in browsers can defeat this sort of 'undeletable' tracking; the only technical countermeasure that I see being fully effective is to have no long-lived cache at all. This is only viable in some environments, so I don't expect browsers to make it a default.
(This doesn't mean that we're doomed; it means that we have to use non-technical solutions to the problem, like publicity, shaming, and so on.)
(I doubt that this is new to web privacy people.)