The case of the very old
If-Modified-Since HTTP header
Every so often I look at the top IP sources for Wandering Thoughts. Recently, I noticed that one relatively active IP was there because it was fetching my Atom syndication feed every few minutes, and on top of that it was always getting a HTTP 200 reply with the full feed. Usually my assumption is that these requests aren't using HTTP conditional GET at all, but I keep looking because I might find something like the Tiny Tiny RSS problem (which I can theoretically fix Tiny Tiny RSS). To my surprise, something a bit interesting is happening.
This feed fetcher was sending an
If-Modified-Since HTTP header,
but it had a rather striking value of 'Wed, 01 Jan 1800 00:00:00
GMT'. Naturally this doesn't match any
Last-Modified value my
feed has ever provided, and it wouldn't help if I used a time based
comparison since all syndication feeds in the world have been
changed since 1800.
Any time I see a very old timestamp like this, I suspect that there's code that has been handed an un-set zero value instead of an actual time (here, a Last-Modified time). Syndication feed fetchers are perhaps especially prone to this; they start out with no Last-Modified time when they fetch a feed for the first time, and then if they ever fail to parse a Last-Modified time properly they might write back an unset value. However, 1800 is a somewhat unusual zero value for time; I'm more used to Unix timestamps, where the zero value is January 1st 1970 GMT.
This feed fetcher identifies itself as 'NextCloud-News/1.0'. If that is this NextCloud application (also), it's written in PHP and is probably setting If-Modified-Since here using a PHP DateTime (or maybe it uses feed-io, I don't actually know PHP so I'm just grep'ing the codebase). I can't readily find any documentation on what the zero value for a DateTime is, or if it's even possible to wind up with one. Neither MySQL, PostgreSQL, nor SQLite appear to use 01 Jan 1800 as a zero value either. So on the whole I'm still lost.
(In passing I'll note that this user-agent value is not all that useful. To be useful, it should include the actual version number of the NextCloud-News release (they're up to 15.x, with 16.0.0 coming soon) and some URL for it, so I can be confident I have identified the right NextCloud News thing.)
PS: If this is a NextCloud-News code issue, correcting it would be
nice (and please don't treat Last-Modified as a timestamp), but it would be better to use
(This elaborates on a Twitter thread.)