Putting a pleasant Python surprise to use
Although I've been programming in Python for a few years now, it keeps surprising me with little bits and pieces. Here's a neat Python language feature that I recently used for the first time (discovered originally through Bram Cohen's LiveJournal).
A common programming pattern is 'search for a something to work on, but stop if you don't find anything'. In Python one might write it something like this (taken more or less from DWiki's source):
found = False for dir in utils.walk_to_root(curdir): page = dir.child("__readme") if page.exists(): found = True break if not found: return '' # Go on to use the __readme file we found in some directory.
Python allows you to put 'else' conditions on loops (both
while loops); the else condition is executed if the loop completed
instead of being
break'd from. This lets us simplify this pattern down
for dir in utils.walk_to_root(curdir): page = dir.child("__readme") if page.exists(): break else: return ''
If there's no
__readme file to be found from the current directory
up to the root, we just return nothing; otherwise, we'll process it.
This DWiki code is the first occasion I've had to use this feature since
I discovered it, and I'm pleased to finally have been able to.
(As you can now see, not all the entries in this blog are going to be long and meandering.)
Pitfalls in generating
Every HTTP reply from a web server can include a
header, which theoretically tells interested parties when the web page
was last modified. This is really something that works best when the
web server is just sending out static files; when it is generating
dynamic content, like DWiki does, things get interesting.
The major use of Last-Modified: is to decide when a browser already has a current copy of the web page and doesn't need to fetch it again. Thus, with dynamic pages built from many pieces the Last-Modified: time needs to be the most recent modification time for all of the pieces. Then when any of the pieces that make a page are updated, changing the page's appearance, the page's Last-Modified: time will change and the browser will fetch a new copy.
This means DWiki can't just use the page's modification time (which is what gets shown in the 'Last Modified:' line at the bottom of most CSpace pages). DWiki pages are built from a cascade of templates and pieces, so as it builds a web page DWiki keeps track of the most recent modification time of all the files involved; change one template, and the updated time is automatically propagated through the system.
Or it would if there weren't some complications.
Being logged in to a DWiki, and who you're logged in as, not just can but will change the appearance of pages. It's not just big things, like being able to see a page's contents; it's everything from DWiki saying 'Welcome, <whoever>' in the top right corner down to whether you get a login form or a logout button. So if you log in or out and then refresh pages in your browser, the pages better change to look right for your current status; otherwise users start wondering if their login or logout actually worked.
In order to support Last-Modified: with authentication, DWiki would have to somehow arrange to track the last time you logged in or out of the DWiki. While this is theoretically possible, it would be a bunch of work and would involve trying to send a cookie to every visiting browser (and I refuse to do the latter).
Instead DWiki just mostly punts when authentication is enabled; regular
DWiki pages get served without any Last-Modified: header. Fortunately
modern browsers have another, better header called
ETags: that they
can use instead of Last-Modified: to see if they need to refresh a page.
Page List Soup
The other complication is easy to state: what's the modification time of a list of files?
Lists of files come up in several places in DWiki, most importantly when generating Atom syndication feeds. Atom feeds also complicate life because of two factors:
- the Atom feed format requires some kind of 'most recently updated' timestamp.
ETags:header's value is some identifying hash of the HTTP response's contents, so if the contents keep changing (because one generates a 'right now' timestamp as the most recently updated time in an Atom feed), the ETags: header will keep changing and everything will keep re-fetching Atom feeds and pages even when nothing has changed.
(Also, the RSS/Atom feed reader I use doesn't use ETags:, only Last-Modified:, so I have been trying to support Last-Modified: in my Atom feeds.)
The simple approach is to make the Last-Modified: value be the modification time of the most recently modified file in the list. Unfortunately this doesn't change when files are added or removed from the middle of the list, which makes it useless for most of DWiki's purposes.
At the moment DWiki folds in the modification times of all the directories it scans when looking at files during Atom feed generation (thereby currently missing directories that currently have no files in them at all). At other times it just punts.
Summary For Client Authors
If you're thinking of writing a feed reader client or a web browser, I
have this to say: please just use the
ETags: header. Since it's some
hash value of the HTTP response's data, it's easy to generate and always
accurate about whether or not the response is the same.
is essentially an approximation in everything except relatively simple
situations or programs that go to obsessive amounts of work.