The conflict between caching and tracking on the web

August 21, 2011

The web user privacy story of the recent past has been the news about web tracking companies that are using ETag and Last-Modified headers to covertly track users. In the process of thinking about the issue and writing yesterday's entry, I've come to the probably unsurprising conclusion that there is a fundamental conflict between browser caching and avoiding tracking.

The attacks on ETag and Last-Modified are the tip of an iceberg. Both of these headers are quite convenient for tracking because the browser will directly store them and report them back to the server, which means that you can encode a value into them and then recover it later. But cache state itself is also stored information, and the very nature of caching means that the browser has to report the information back to the server if the cache is going to do any good.

This leads directly to the conflict: the more effective the browser cache is, the easier it is to use the browser cache contents to track you. Conversely, all of the methods of making this tracking harder have the necessary effect of making your browser cache less effective. To make yourself completely untrackable, in theory you need to have no browser cache.

(In practice I think that what you really need to do is inject enough noise into the tracking process that it can't reliably tell people apart. However this rapidly gets into an arms race between the two sides, with the tracking side storing and reading back more and more redundant information in order to defeat noise-injection things like browsers that drop random entries from their cache.)

Thus I'm very doubtful that technical countermeasures in browsers can defeat this sort of 'undeletable' tracking; the only technical countermeasure that I see being fully effective is to have no long-lived cache at all. This is only viable in some environments, so I don't expect browsers to make it a default.

(This doesn't mean that we're doomed; it means that we have to use non-technical solutions to the problem, like publicity, shaming, and so on.)

(I doubt that this is new to web privacy people.)

Written on 21 August 2011.
« Why browsers can't really change or validate Last-Modified
V8's neat encoding trick for type tracking »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Aug 21 01:25:45 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.