ntpdate has a surprising restriction on what it will sync to

February 25, 2019

One of the things that we monitor these days is the health of our three local OpenBSD NTP servers that all of our machines get time from. For various reasons we do it using ntpdate, partly because that's what we synchronize time with these days. Last Wednesday, we started getting alerts that ntpdate couldn't synchronize to them; first to one server, and then expanding to a second one of the three.

When I started investigating I expected to find that the OpenBSD NTP daemon had fallen over or lost its upstream synchronization. But ntpctl said things were fine, and in fact other machines here that run chrony were still happily synchronized to the machines that ntpdate no longer liked. Eventually I found the -d flag to ntpdate and used it in the hopes that something useful would be reported. And indeed there was useful and important information in the debugging output:

; ntpdate -d -q hickory
[...] Server dropped: Server has gone too long without sync
reference time: e016b89a.2da10fff Tue, Feb 19 2019 12:17:14.178
[...] ntpdate[...]: no server suitable for synchronization found 

(This was happening at about at 12:30 on February 20th.)

The 'reference time' that ntpdate is reporting here is the time that the NTP server last set or updated its clock (see RFC 5905, page 22). Ntpdate turns out to have an undocumented internal limit on how long it will let servers go without updating their clocks; if a server has gone 24 hours or more, ntpdate refuses to trust it any more and reports the 'too long without sync' message in debugging mode, as we see here. As far as I can tell this isn't required by the NTP protocol, it's just something that ntpdate does, and it turned out to be our issue with our OpenBSD servers.

How frequently a server updates its local clock seems to depend on the NTP daemon (and perhaps system). Our Ubuntu machines running chrony update their server clocks quite frequently, roughly every 18 minutes at most. Our assortment of machines running standard NTPD do it somewhat less frequently, at a maximum of about 36 minutes (on both CentOS 7 and OmniOS). Our three OpenBSD machines, running the OpenBSD NTP, so far appear to update their clocks at a maximum of about three and a half hours over the past two days (which is how long we've been tracking this). Evidently they can go slower under the right circumstances, though, since all three of them got to more than 24 hours (more or less).

In the end, the two OpenBSD servers that had gone over 24 hours updated their clocks on their own before I could restart their NTP daemons. I restarted ntpd on the third when it was about ten minutes away from hitting 24 hours; maybe it would have updated its clock at the last moment, or maybe slightly afterward, or maybe not at all. I didn't feel like taking the chance.

(This elaborates on a tweet of mine.)

PS: Our OpenBSD NTP servers do other things as well as NTP; in fact, NTP is not their primary purpose. It's just handy to run a NTP daemon on highly stable machines that are already central components of our environment, since we need to run some NTP daemons somewhere.

Written on 25 February 2019.
« Link: Vim anti-patterns
Using Prometheus subqueries to do calculations over time ranges »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Feb 25 22:02:52 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.