Wandering Thoughts archives


How not to generate If-Modified-Since headers for conditional GETs

Recently I looked through my syndication feed stats (as I periodically do) and noticed that the Tiny Tiny RSS program was both responsible for quite a lot of feed fetching and also didn't seem to ever be successfully doing conditional GETs. Most things in this situation aren't even attempting conditional GETs, but investigation showed that Tiny Tiny RSS was consistently sending a If-Modified-Since header with times that were generally just a bit after the actual Last-Modified timestamp of the syndication feed. For good reasons I require strict equality of If-Modified-Since values, so this insured that Tiny Tiny RSS never made a successful conditional GET.

Since I was curious, I got a copy of the current Tiny Tiny RSS code and dug into it to see where this weird If-Modified-Since value was coming from and if there was anything I could do about it. The answer was worse than I was expecting; it turns out that the I-M-S timestamp that Tiny Tiny RSS sends has absolutely nothing to do with the Last-Modified value that I sent it. Where it comes from is that whenever Tiny Tiny RSS adds a new entry from a feed to its database it records the (local) time at which it did this, then the most recent such entry timestamp becomes the If-Modified-Since value that Tiny Tiny RSS sends during feed requests.

(You can see this in update_rss_feed in include/rssfuncs.php in the TT RSS source. Technically the time recorded for new entries is when TT RSS started processing the updated feed, not the moment it added the database record for a new entry.)

This is an absolutely terrible scheme, almost as bad as simply generating random timestamps. There are a cascade of things that can go wrong with it:

  • It implicitly assumes that the clocks on the server and the client are in sync, since If-Modified-Since must be in the server's time yet the timestamp is generated from client time.

  • Tiny Tiny RSS loses if a feed publishes a new entry, TT RSS pulls the feed, and then the feed publishes a second entry before TT RSS finishes processing the first new entry. TT RSS's 'entry added' timestamp and thus the If-Modified-Since timestamp will be after the revised feed's date, so the server will 304 further requests. TT RSS will only pick up the second entry when a third entry is published or the feed is otherwise modified so that its Last-Modified date moves forward enough.

  • If the feed deletes or modifies an entry and properly updates its overall Last-Modified timestamp as a result of this, Tiny Tiny RSS will issue what are effectively unconditional GETs until the feed publishes a completely new entry (since the last time that TT RSS saw a new entry will be before the feed's new Last-Modified time).

There are probably other flaws that I'm not thinking of.

(I don't think it's a specification violation to send an If-Modified-Since header if you never got a Last-Modified header, but if it is that's another flaw in this scheme, since Tiny Tiny RSS will totally do that.)

This scheme's sole virtue is that on a server which uses timestamp comparisons for If-Modified-Since (instead of equality checks) it will sometimes succeed in getting 304 Not Modified responses. Some of these responses will even be correct and when they aren't really correct, it's not the server's fault.

web/IfModifiedSinceHowNot written at 02:19:46; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.