Why browsers can't really change or validate Last-Modified

August 20, 2011

Quoting from Nik Cubrilovic's Persistant and Unblockable Cookies Using HTTP Headers (via Hacker News):

I will be filing a bug report with the open source browsers and requesting that the date is parsed properly. This won't completely solve the problem, since users can still be tracked by setting a unique datetime - but perhaps one of the more innovative browsers will come up with a solution where the time is rounded off to the nearest hour, and some basic sanity checking is done.

There's two issues here, validating Last-Modified and changing it. As it happens, I feel that changing Last-Modified is basically impossible for the browser to do in a way that is both safe and useful.

Let's set aside the server's view of Last-Modified for now, and talk about how modifying Last-Modified affects caching if we assume a server that does time comparisons on L-M. First, it's effectively pointless for a browser to shift L-M backwards in time, since it guarantees that the server can never give you a 304 response; you're claiming that you only have something that's older than what the server has, so it must give you the current version. You might as well not cache the page at all. Second, it's clearly dangerous to shift L-M into the future (the further the shift the more dangerous), because you'll miss any server updates made between now and that future point.

In theory you might think that it's safe to shift L-M forward provided that the new L-M time is still in the past. In practice I think that there are a number of realistic scenarios where this still causes you to miss server updates; for example, there might have been a server-side rolling deployment of a content update that has not yet gotten to the server that you use. The 'new' content has an old timestamp because it was initially deployed some time ago on the first server (and because the server is keeping timestamps in sync to promote caching).

(Backing out of a deployment is one reason to avoid a time-based Last-Modified comparison in your server.)

This scenario may seem unusual. But the problem with making general browser changes that modify cache behavior is that they must be correct in general, not just for 'usual' situations, because someday some of your users will hit an unusual situation. And showing out of date content to users because you lied to the web server is a pretty bad sin.

The problem with validating Last-Modified headers is a pragmatic one. It's virtually guaranteed that today, there are plenty of websites and web applications that serve up Last-Modified timestamps in formats that are not quite correctly formed and RFC-compliant (for all I know, DWiki is one of them; I'm not sure I paid careful attention to that bit of the RFC when writing the code). This means that you have three choices: you can ignore non-RFC dates entirely, which means that you cache less, you can try to be increasingly generous in your date parsing so that you accept common RFC violations, which is a lot of work, or you can not validate the Last-Modified value at all, treating it as a magic cookie. It should be no wonder that the last option is relatively popular.

(I admit that I would like to see browsers reject clearly impossible things, like the example that Nik Cubrilovic shows. I'm just not sure it's all that easy or reliable for a computer to tell 'clearly impossible' from a merely badly formatted date.)


Comments on this page:

From 173.61.157.91 at 2011-08-21 13:10:31:

Or you can simply test your Last-Modified validation algorithm against every server out there.

You might consider this impractical, but remember that widely used browsers are widely used, so after covering everything the developers could find, they could release a version that reported unparseable strings back home. Two or three cycles of that, and I think you're done. Also, at least one browser maker has a truly massive index of http requests/responses to test against and is known to be very good at parallelizing computations.

Written on 20 August 2011.
« Visibility: an advantage of automation
The conflict between caching and tracking on the web »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Sat Aug 20 03:24:02 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.