The case of the very old If-Modified-Since HTTP header

June 6, 2021

Every so often I look at the top IP sources for Wandering Thoughts. Recently, I noticed that one relatively active IP was there because it was fetching my Atom syndication feed every few minutes, and on top of that it was always getting a HTTP 200 reply with the full feed. Usually my assumption is that these requests aren't using HTTP conditional GET at all, but I keep looking because I might find something like the Tiny Tiny RSS problem (which I can theoretically fix Tiny Tiny RSS). To my surprise, something a bit interesting is happening.

This feed fetcher was sending an If-Modified-Since HTTP header, but it had a rather striking value of 'Wed, 01 Jan 1800 00:00:00 GMT'. Naturally this doesn't match any Last-Modified value my feed has ever provided, and it wouldn't help if I used a time based comparison since all syndication feeds in the world have been changed since 1800.

Any time I see a very old timestamp like this, I suspect that there's code that has been handed an un-set zero value instead of an actual time (here, a Last-Modified time). Syndication feed fetchers are perhaps especially prone to this; they start out with no Last-Modified time when they fetch a feed for the first time, and then if they ever fail to parse a Last-Modified time properly they might write back an unset value. However, 1800 is a somewhat unusual zero value for time; I'm more used to Unix timestamps, where the zero value is January 1st 1970 GMT.

This feed fetcher identifies itself as 'NextCloud-News/1.0'. If that is this NextCloud application (also), it's written in PHP and is probably setting If-Modified-Since here using a PHP DateTime (or maybe it uses feed-io, I don't actually know PHP so I'm just grep'ing the codebase). I can't readily find any documentation on what the zero value for a DateTime is, or if it's even possible to wind up with one. Neither MySQL, PostgreSQL, nor SQLite appear to use 01 Jan 1800 as a zero value either. So on the whole I'm still lost.

(In passing I'll note that this user-agent value is not all that useful. To be useful, it should include the actual version number of the NextCloud-News release (they're up to 15.x, with 16.0.0 coming soon) and some URL for it, so I can be confident I have identified the right NextCloud News thing.)

PS: If this is a NextCloud-News code issue, correcting it would be nice (and please don't treat Last-Modified as a timestamp), but it would be better to use ETag and If-None-Match.

(This elaborates on a Twitter thread.)

Written on 06 June 2021.
« HTTP/3 needs us (and other people) to make firewall changes
TLS certificates have at least two internal representations of time »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jun 6 00:33:07 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.