Pitfalls in generating Last-Modified:

June 14, 2005

Every HTTP reply from a web server can include a Last-Modified: header, which theoretically tells interested parties when the web page was last modified. This is really something that works best when the web server is just sending out static files; when it is generating dynamic content, like DWiki does, things get interesting.

The major use of Last-Modified: is to decide when a browser already has a current copy of the web page and doesn't need to fetch it again. Thus, with dynamic pages built from many pieces the Last-Modified: time needs to be the most recent modification time for all of the pieces. Then when any of the pieces that make a page are updated, changing the page's appearance, the page's Last-Modified: time will change and the browser will fetch a new copy.

This means DWiki can't just use the page's modification time (which is what gets shown in the 'Last Modified:' line at the bottom of most CSpace pages). DWiki pages are built from a cascade of templates and pieces, so as it builds a web page DWiki keeps track of the most recent modification time of all the files involved; change one template, and the updated time is automatically propagated through the system.

Or it would if there weren't some complications.

Authentication Soup

Being logged in to a DWiki, and who you're logged in as, not just can but will change the appearance of pages. It's not just big things, like being able to see a page's contents; it's everything from DWiki saying 'Welcome, <whoever>' in the top right corner down to whether you get a login form or a logout button. So if you log in or out and then refresh pages in your browser, the pages better change to look right for your current status; otherwise users start wondering if their login or logout actually worked.

In order to support Last-Modified: with authentication, DWiki would have to somehow arrange to track the last time you logged in or out of the DWiki. While this is theoretically possible, it would be a bunch of work and would involve trying to send a cookie to every visiting browser (and I refuse to do the latter).

Instead DWiki just mostly punts when authentication is enabled; regular DWiki pages get served without any Last-Modified: header. Fortunately modern browsers have another, better header called ETags: that they can use instead of Last-Modified: to see if they need to refresh a page.

Page List Soup

The other complication is easy to state: what's the modification time of a list of files?

Lists of files come up in several places in DWiki, most importantly when generating Atom syndication feeds. Atom feeds also complicate life because of two factors:

  • the Atom feed format requires some kind of 'most recently updated' timestamp.
  • the ETags: header's value is some identifying hash of the HTTP response's contents, so if the contents keep changing (because one generates a 'right now' timestamp as the most recently updated time in an Atom feed), the ETags: header will keep changing and everything will keep re-fetching Atom feeds and pages even when nothing has changed.

(Also, the RSS/Atom feed reader I use doesn't use ETags:, only Last-Modified:, so I have been trying to support Last-Modified: in my Atom feeds.)

The simple approach is to make the Last-Modified: value be the modification time of the most recently modified file in the list. Unfortunately this doesn't change when files are added or removed from the middle of the list, which makes it useless for most of DWiki's purposes.

At the moment DWiki folds in the modification times of all the directories it scans when looking at files during Atom feed generation (thereby currently missing directories that currently have no files in them at all). At other times it just punts.

Summary For Client Authors

If you're thinking of writing a feed reader client or a web browser, I have this to say: please just use the ETags: header. Since it's some hash value of the HTTP response's data, it's easy to generate and always accurate about whether or not the response is the same. Last-Modified: is essentially an approximation in everything except relatively simple situations or programs that go to obsessive amounts of work.

Written on 14 June 2005.
« Making a Python mountain out of a molehill
Putting a pleasant Python surprise to use »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jun 14 00:50:46 2005
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.