ntpdate has a surprising restriction on what it will sync to
One of the things that we monitor these days is the health of our
three local OpenBSD NTP servers that all of our machines get time
from. For various reasons we do it using
ntpdate, partly because
that's what we synchronize time with these days.
Last Wednesday, we started getting alerts that
synchronize to them; first to one server, and then expanding to a
second one of the three.
When I started investigating I expected to find that the OpenBSD
NTP daemon had fallen over or lost its upstream synchronization.
ntpctl said things were
fine, and in fact other machines here that run chrony were still happily synchronized to
the machines that
ntpdate no longer liked. Eventually I found the
-d flag to
used it in the hopes that something useful would be reported. And
indeed there was useful and important information in the debugging
; ntpdate -d -q hickory [...] 220.127.116.11: Server dropped: Server has gone too long without sync [...] reference time: e016b89a.2da10fff Tue, Feb 19 2019 12:17:14.178 [...] [...] ntpdate[...]: no server suitable for synchronization found
(This was happening at about at 12:30 on February 20th.)
The 'reference time' that
ntpdate is reporting here is the time
that the NTP server last set or updated its clock (see RFC 5905,
page 22). Ntpdate
turns out to have an undocumented internal limit on how long it
will let servers go without updating their clocks; if a server has
gone 24 hours or more,
ntpdate refuses to trust it any more and
reports the 'too long without sync' message in debugging mode, as
we see here. As far as I can tell this isn't required by the NTP
protocol, it's just something that
ntpdate does, and it turned
out to be our issue with our OpenBSD servers.
How frequently a server updates its local clock seems to depend on the NTP daemon (and perhaps system). Our Ubuntu machines running chrony update their server clocks quite frequently, roughly every 18 minutes at most. Our assortment of machines running standard NTPD do it somewhat less frequently, at a maximum of about 36 minutes (on both CentOS 7 and OmniOS). Our three OpenBSD machines, running the OpenBSD NTP, so far appear to update their clocks at a maximum of about three and a half hours over the past two days (which is how long we've been tracking this). Evidently they can go slower under the right circumstances, though, since all three of them got to more than 24 hours (more or less).
In the end, the two OpenBSD servers that had gone over 24 hours
updated their clocks on their own before I could restart their NTP
daemons. I restarted
ntpd on the third when it was about ten
minutes away from hitting 24 hours; maybe it would have updated its
clock at the last moment, or maybe slightly afterward, or maybe not
at all. I didn't feel like taking the chance.
(This elaborates on a tweet of mine.)
PS: Our OpenBSD NTP servers do other things as well as NTP; in fact, NTP is not their primary purpose. It's just handy to run a NTP daemon on highly stable machines that are already central components of our environment, since we need to run some NTP daemons somewhere.