Wandering Thoughts archives

2020-06-07

A Go time package gotcha with parsing time strings that use named time zones

Go has a generally well regarded time package. One of the things it can do is parse a string representation of a time based on a specification of the time format, using time.Parse(); for example, to parse times like "Sat Mar 7 11:06:39 PST 2015" or "Sat, 07 Mar 2015 11:06:39 -0800" (which are in Unix date format and 'RFC 1123 Z' format respectively). As usual, these parsed time.Time values have a location, ie a time zone. However, if you're dealing with time strings with named time zones, like 'PST', this parsing has a very large catch. This catch is sort of spelled out in the official documentation, but not quite completely clearly:

When parsing a time with a zone abbreviation like MST, if the zone abbreviation has a defined offset in the current location, then that offset is used. The zone abbreviation "UTC" is recognized as UTC regardless of location. If the zone abbreviation is unknown, Parse records the time as being in a fabricated location with the given zone abbreviation and a zero offset.

MST is a widely known zone abbreviation, so you might think that it will always have 'a defined offset in the current location'. This is not so. If your current location doesn't ever use 'MST' as a zone abbreviation, then it's not considered 'a defined offset' and you get a time that claims it is in 'MST' but that has a 0 offset from UTC. This is not a correctly parsed time as any human being would understand it. Go is making up an offset in order to not report an error.

What Go means by 'a defined offset in the current location' is that you can use 'EST' and 'EDT' if you're in Eastern time. This means that Go will parse a time string containing a named time zone differently depending on your local time zone. If you parse a string that uses 'MST' as its time zone and you are in Mountain time, you will get one time.Time value; if you are in Eastern time (or this server is in UTC time), you will get a completely different time.Time value.

(This implies that if you write out a time string using a named time zone, change your time zone (either personally or server wide), and then parse the time string again, you will get a different time. One way to change your personal time zone is to move a file containing time strings from one server to another.)

This also means that it very much matters whether the source of the time string is using named time zones or numeric time zone offsets. The choice between 'RFC 1123' time format (using named time zones) and 'RFC 1123 Z' format (using numeric values) will give you what is theoretically the same time that Go will not infrequently parse as very different time zones. Only time formats using numeric time zone offsets are safe to use with Go (and even then there is a catch when later formatting them).

My personal opinion is that this is a serious bug in Go's time parsing. If a named time zone offset is given and Go cannot safely determine its actual zone offset, the parse should fail with an error. Turning "Sat Mar 7 11:06:39 PST 2015" into March 7 11:06:39 UTC 2015 is not correct behavior; instead it is actively dangerous. If this means that too many time strings fail to parse, then Go time parsing needs to get smarter about looking up popular named time zone offsets, or it should provide a 'parse liberally' function with the current behavior of time.Parse().

Another consequence of this behavior is that a time.Time time zone that is printed as 'EST' is not always 'EST' (and the same for any named time offset). Sometimes it is 'EST (-0500)' and sometimes it is 'EST (+0000)', ie 'UTC but we are claiming that it is called EST'. In my opinion, Go should also stop doing this. If it is going to accept 'EST' but treat it as UTC, it should actually set the location to UTC so that people are not fooled by how the same two times, apparently equal because they format with the same output, are in fact not equal.

(To Go's credit, the default string format for time.Time values, as shown in fmt's %v format, does show both the time zone name and the numerical offset. This gives you odd but honest output like '2015-03-07 11:06:39 +0000 MST'. But if you format with just the named time zone, you can have two times that format the same but don't compare equal.)

(This entire issue was brought to my attention by James Antill's comments on my entry about how time.Time values have locations.)

programming/GoTimeParsingTZIssue written at 22:16:08; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.