Pitfalls in generating Last-Modified:
Every HTTP reply from a web server can include a Last-Modified:
header, which theoretically tells interested parties when the web page
was last modified. This is really something that works best when the
web server is just sending out static files; when it is generating
dynamic content, like DWiki does, things get interesting.
The major use of Last-Modified: is to decide when a browser already has
a current copy of the web page and doesn't need to fetch it again. Thus,
with dynamic pages built from many pieces the Last-Modified: time needs
to be the most recent modification time for all of the pieces. Then
when any of the pieces that make a page are updated, changing the page's
appearance, the page's Last-Modified: time will change and the browser
will fetch a new copy.
This means DWiki can't just use the page's modification time (which
is what gets shown in the 'Last Modified:' line at the bottom of most
CSpace pages). DWiki pages are built from a cascade of templates and
pieces, so as it builds a web page DWiki keeps track of the most recent
modification time of all the files involved; change one template, and
the updated time is automatically propagated through the system.
Or it would if there weren't some complications.
Authentication Soup
Being logged in to a DWiki, and who you're logged in as, not just can
but will change the appearance of pages. It's not just big things, like
being able to see a page's contents; it's everything from DWiki saying
'Welcome, <whoever>' in the top right corner down to whether you get a
login form or a logout button. So if you log in or out and then refresh
pages in your browser, the pages better change to look right for your
current status; otherwise users start wondering if their login or logout
actually worked.
In order to support Last-Modified: with authentication, DWiki would have
to somehow arrange to track the last time you logged in or out of the
DWiki. While this is theoretically possible, it would be a bunch of work
and would involve trying to send a cookie to every visiting browser (and
I refuse to do the latter).
Instead DWiki just mostly punts when authentication is enabled; regular
DWiki pages get served without any Last-Modified: header. Fortunately
modern browsers have another, better header called ETags: that they
can use instead of Last-Modified: to see if they need to refresh a page.
Page List Soup
The other complication is easy to state: what's the modification
time of a list of files?
Lists of files come up in several places in DWiki, most importantly
when generating Atom syndication feeds. Atom feeds also complicate
life because of two factors:
- the Atom feed format requires some kind of 'most recently updated'
timestamp.
- the
ETags: header's value is some identifying hash of the HTTP
response's contents, so if the contents keep changing (because
one generates a 'right now' timestamp as the most recently updated
time in an Atom feed), the ETags: header will keep changing and
everything will keep re-fetching Atom feeds and pages even when
nothing has changed.
(Also, the RSS/Atom feed reader I use doesn't use ETags:, only
Last-Modified:, so I have been trying to support Last-Modified:
in my Atom feeds.)
The simple approach is to make the Last-Modified: value be the
modification time of the most recently modified file in the list.
Unfortunately this doesn't change when files are added or removed from
the middle of the list, which makes it useless for most of DWiki's
purposes.
At the moment DWiki folds in the modification times of all the
directories it scans when looking at files during Atom feed generation
(thereby currently missing directories that currently have no files in
them at all). At other times it just punts.
Summary For Client Authors
If you're thinking of writing a feed reader client or a web browser, I
have this to say: please just use the ETags: header. Since it's some
hash value of the HTTP response's data, it's easy to generate and always
accurate about whether or not the response is the same. Last-Modified:
is essentially an approximation in everything except relatively simple
situations or programs that go to obsessive amounts of work.