Why browsers can't really change or validate
Quoting from Nik Cubrilovic's Persistant and Unblockable Cookies Using HTTP Headers (via Hacker News):
I will be filing a bug report with the open source browsers and requesting that the date is parsed properly. This won't completely solve the problem, since users can still be tracked by setting a unique datetime - but perhaps one of the more innovative browsers will come up with a solution where the time is rounded off to the nearest hour, and some basic sanity checking is done.
There's two issues here, validating
Last-Modified and changing it. As
it happens, I feel that changing
Last-Modified is basically impossible
for the browser to do in a way that is both safe and useful.
Let's set aside the server's view of
Last-Modified for now, and talk about how modifying
Last-Modified affects caching if we assume a server that does time
comparisons on L-M. First, it's effectively pointless for a browser
to shift L-M backwards in time, since it guarantees that the server
can never give you a 304 response; you're claiming that you only have
something that's older than what the server has, so it must give you the
current version. You might as well not cache the page at all. Second,
it's clearly dangerous to shift L-M into the future (the further the
shift the more dangerous), because you'll miss any server updates made
between now and that future point.
In theory you might think that it's safe to shift L-M forward provided that the new L-M time is still in the past. In practice I think that there are a number of realistic scenarios where this still causes you to miss server updates; for example, there might have been a server-side rolling deployment of a content update that has not yet gotten to the server that you use. The 'new' content has an old timestamp because it was initially deployed some time ago on the first server (and because the server is keeping timestamps in sync to promote caching).
(Backing out of a deployment is one reason to avoid a time-based
Last-Modified comparison in your server.)
This scenario may seem unusual. But the problem with making general browser changes that modify cache behavior is that they must be correct in general, not just for 'usual' situations, because someday some of your users will hit an unusual situation. And showing out of date content to users because you lied to the web server is a pretty bad sin.
The problem with validating
Last-Modified headers is a pragmatic
one. It's virtually guaranteed that today, there are plenty of websites
and web applications that serve up
Last-Modified timestamps in formats
that are not quite correctly formed and RFC-compliant (for all I know,
DWiki is one of them; I'm not sure I paid careful attention to that
bit of the RFC when writing the code). This means that you have three
choices: you can ignore non-RFC dates entirely, which means that you
cache less, you can try to be increasingly generous in your date parsing
so that you accept common RFC violations, which is a lot of work, or
you can not validate the
Last-Modified value at all, treating it as a
magic cookie. It should be no wonder that the last option is relatively
(I admit that I would like to see browsers reject clearly impossible things, like the example that Nik Cubrilovic shows. I'm just not sure it's all that easy or reliable for a computer to tell 'clearly impossible' from a merely badly formatted date.)