2021-06-06
TLS certificates have at least two internal representations of time
TLS certificates famously have a validity period, expressed as 'not before' and 'not after' times. These times can have a broad range, and there are some TLS Certificate Authority root certificates that already have 'not after' times relatively far in the future (as I mentioned here). All TLS certificates, including CA root certificates, are encoded in ASN.1. Recently I was both generating long-lived certificates and peering very closely into them in an attempt to figure out why my new certificates weren't working, and in the process of doing so I discovered that ASN.1 has at least two representations of time and what representation a TLS certificate uses depends on the specific time.
Most TLS certificates you will encounter today encode time in what
'openssl asn1parse
' calls a UTCTIME. If you have a
TLS certificate with a sufficiently far in the future time, it will
instead be represented as what OpenSSL calls a GENERALIZEDTIME. Somewhat to my
surprise, both of these turn out to be strings under the covers and
the reason that TLS switches from one to the other isn't what I
thought it was. I'll start by showing the encoding for a not before
and a not after date (and time) for a certificate I generated:
UTCTIME :210531194026Z GENERALIZEDTIME :20610521194026Z
This certificate is valid from 2021-05-31 19:40 UTC to 2061-05-21 19:40 UTC. The Z says this is in UTC, the '194026' is 19:40:26, and the '0531' and '0521' are the month and day. The difference between the two time formats is at the front; the UTCTIME starts with '21' while the other starts with '2061'.
When I started looking into the details of this, I assume that the choice between one or the other form was because of the year 2038 problem. This is not the case, since UTCTIME is not represented as any sort of Unix epoch timestamp and has no such limits. Instead, UTCTIME's limitation is that it only uses a two-digit year. As covered in RFC 5280, if the two year digits are 00 to 49, the year is 20yy, and for 50 to 99, it is 19yy. This means that a UTCTIME can only represent times up to the end of 2049. The certificate I generated is valid past that, so it must use the more general version.
In theory, all code that deals with TLS certificates should be able to deal with both forms of time. This is a low level concern that the ASN.1 parsing library should normally hide from programs, and both forms have been valid since RFC 2459 from 1999. In practice, I suspect that there's almost no use of the second time format in certificates today, so I suspect that there's at least some software that mishandles them. For general use, we have years to go before this starts to be an issue (starting with CA root certificates that push their expiry date into 2050 and beyond).
For our own use, I think I'm going to limit certificate validity to no later than 2049. The more cautious approach is to assume that there's a Unix timestamp somewhere in the chain of processing things and stick to times that don't go beyond the year 2038 boundary.
(I think that these are the only two ASN.1 time representations that are considered valid in TLS certificates on the Internet, but I haven't carefully gone through the RFCs and other sources of information to be sure. So I'm being cautious and saying that TLS certificates have 'at least' two representations of time.)
The case of the very old If-Modified-Since
HTTP header
Every so often I look at the top IP sources for Wandering Thoughts. Recently, I noticed that one relatively active IP was there because it was fetching my Atom syndication feed every few minutes, and on top of that it was always getting a HTTP 200 reply with the full feed. Usually my assumption is that these requests aren't using HTTP conditional GET at all, but I keep looking because I might find something like the Tiny Tiny RSS problem (which I can theoretically fix Tiny Tiny RSS). To my surprise, something a bit interesting is happening.
This feed fetcher was sending an If-Modified-Since
HTTP header,
but it had a rather striking value of 'Wed, 01 Jan 1800 00:00:00
GMT'. Naturally this doesn't match any Last-Modified
value my
feed has ever provided, and it wouldn't help if I used a time based
comparison since all syndication feeds in the world have been
changed since 1800.
Any time I see a very old timestamp like this, I suspect that there's code that has been handed an un-set zero value instead of an actual time (here, a Last-Modified time). Syndication feed fetchers are perhaps especially prone to this; they start out with no Last-Modified time when they fetch a feed for the first time, and then if they ever fail to parse a Last-Modified time properly they might write back an unset value. However, 1800 is a somewhat unusual zero value for time; I'm more used to Unix timestamps, where the zero value is January 1st 1970 GMT.
This feed fetcher identifies itself as 'NextCloud-News/1.0'. If that is this NextCloud application (also), it's written in PHP and is probably setting If-Modified-Since here using a PHP DateTime (or maybe it uses feed-io, I don't actually know PHP so I'm just grep'ing the codebase). I can't readily find any documentation on what the zero value for a DateTime is, or if it's even possible to wind up with one. Neither MySQL, PostgreSQL, nor SQLite appear to use 01 Jan 1800 as a zero value either. So on the whole I'm still lost.
(In passing I'll note that this user-agent value is not all that useful. To be useful, it should include the actual version number of the NextCloud-News release (they're up to 15.x, with 16.0.0 coming soon) and some URL for it, so I can be confident I have identified the right NextCloud News thing.)
PS: If this is a NextCloud-News code issue, correcting it would be
nice (and please don't treat Last-Modified as a timestamp), but it would be better to use ETag
and If-None-Match
.
(This elaborates on a Twitter thread.)