The problem with If-Modified-Since as a timestamp

October 29, 2005

The If-Modified-Since header in HTTP requests is used as one way of doing 'conditional GET' requests, where the web server can give you a nice bandwidth-saving '304' response if the URL hasn't changed since the version you already have. In theory the header has an arbitrary timestamp and the server will 304 the request unless the page has changed since that time.

In practice, the HTTP RFC strongly recommends that clients treat the value as a magic cookie, just repeating the value the server last told them. Unfortunately, not everyone does this. This is bad, because it is very difficult for the server to use If-Modified-Since as a real timestamp.

To accept If-Modified-Since as a timestamp, your last modified time has to go forward any time a change is made. Or, to put it another way, you have to guarantee that your last modified times will always go forward. This sounds nice in theory but is extraordinarily difficult in practice, even for static web pages.

For example, here's a bunch of things you have to consider for static pages on Apache on Unix:

  • did I rename the old 'page.bak' to 'page.html'?
  • did I rename a higher-level directory to flip to another version of an entire section?
  • did I change rules in a .htaccess file that affects the page?
  • did I modify the web server's configuration files, ditto?

(Even if Apache made all these checks, it still couldn't guarantee that I hadn't just had to flip a backup server with a not yet fully up to date copy of the website into production when my main server exploded.)

Dynamic web pages have it worse, in part for the reasons in Pitfalls in generating Last-Modified; they have more moving pieces and the moving pieces don't always come with change times attached.

Doing this right requires the application generating dynamic pages to find out any time anything affecting those pages changes. There are 'total control' applications like that out there, but not very many, as such total control has significant costs (for a start, everything has to go through the application).

DWiki is honest about not being able to make its own timestamps always run forward, and thus requires strict If-Modified-Since matching as a minimum. To do otherwise would run the risk of erroneous 304s, which in my opinion are much more serious than some extra bandwidth used.

(People with more constrained bandwidth may feel otherwise. But in that case they should fix their clients to use ETag.)

Written on 29 October 2005.
« Inside building RPMs with Python distutils
Affiliate marketing is undead »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Oct 29 02:07:00 2005
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.