How not to generate If-Modified-Since
headers for conditional GETs
Recently I looked through my syndication feed stats (as I periodically
do) and noticed that the Tiny Tiny RSS program was both responsible
for quite a lot of feed fetching and also didn't seem to ever be
successfully doing conditional GETs. Most
things in this situation aren't even attempting conditional GETs,
but investigation showed that Tiny Tiny RSS was consistently sending
a If-Modified-Since
header with times that were generally just a
bit after the actual Last-Modified
timestamp of the syndication
feed. For good reasons I require
strict equality of If-Modified-Since
values, so this insured that
Tiny Tiny RSS never made a successful conditional GET.
Since I was curious, I got a copy of the current Tiny Tiny RSS code and
dug into it to see where this weird If-Modified-Since
value was coming
from and if there was anything I could do about it. The answer was worse
than I was expecting; it turns out that the I-M-S timestamp that Tiny
Tiny RSS sends has absolutely nothing to do with the Last-Modified
value that I sent it. Where it comes from is that whenever Tiny Tiny
RSS adds a new entry from a feed to its database it records the (local)
time at which it did this, then the most recent such entry timestamp
becomes the If-Modified-Since
value that Tiny Tiny RSS sends during
feed requests.
(You can see this in update_rss_feed
in include/rssfuncs.php in the
TT RSS source. Technically the time recorded for new entries is when TT
RSS started processing the updated feed, not the moment it added the
database record for a new entry.)
This is an absolutely terrible scheme, almost as bad as simply generating random timestamps. There are a cascade of things that can go wrong with it:
- It implicitly assumes that the clocks on the server and the client
are in sync, since
If-Modified-Since
must be in the server's time yet the timestamp is generated from client time. - Tiny Tiny RSS loses if a feed publishes a new entry, TT RSS pulls the
feed, and then the feed publishes a second entry before TT RSS
finishes processing the first new entry. TT RSS's 'entry added'
timestamp and thus the
If-Modified-Since
timestamp will be after the revised feed's date, so the server will 304 further requests. TT RSS will only pick up the second entry when a third entry is published or the feed is otherwise modified so that itsLast-Modified
date moves forward enough. - If the feed deletes or modifies an entry and properly updates its
overall
Last-Modified
timestamp as a result of this, Tiny Tiny RSS will issue what are effectively unconditional GETs until the feed publishes a completely new entry (since the last time that TT RSS saw a new entry will be before the feed's newLast-Modified
time).
There are probably other flaws that I'm not thinking of.
(I don't think it's a specification violation to send an
If-Modified-Since
header if you never got a Last-Modified
header,
but if it is that's another flaw in this scheme, since Tiny Tiny RSS
will totally do that.)
This scheme's sole virtue is that on a server which uses timestamp
comparisons for If-Modified-Since
(instead of equality checks) it will
sometimes succeed in getting 304 Not Modified responses. Some of these
responses will even be correct and when they aren't really correct, it's
not the server's fault.
|
|