I think you should mostly not run NTP daemons on your machines

November 16, 2017

In my entry on switching from ntpd to chrony, I mentioned that we don't have many machines that run full time NTP daemons. In reaction, Sotiris Tsimbonis asked in his comment:

You mean you don't have many machines that run full time NTP daemons and service others as a time source, right?

How do you keep time synchronized in your systems if not by running ntpd? an ntpdate cronjob?

This brings up a heretical position of mine.

I'm a professed time maven. Not only do I run NTP daemons on my workstations, but I tinker with their configuration and server lists and enjoy checking in on their NTP synchronization status (it's fun in various ways, honest). Despite all of my enthusiasm for NTP and good time, I think that you should not run NTP daemons on your servers, especially in anything resembling a common default configuration, unless you have special needs and know what you're doing. Instead you should have almost all of your machines set their time from a trusted upstream source on boot and every so often afterward (once an hour is often convenient). This is what we do, and not just because it's easier.

In most situations, the most important thing for server time is that all of your servers are pretty close to each other. It is better that they all be wrong together than some of them be right and others be wrong, and if a server is out of sync you want it to be corrected right away rather than be slowly guided back to correct time. And you want this to happen reliably, without needing monitoring and remediation.

(If you think you're going to monitor and remediate time issues across your server fleet, ask yourself what you'll do if you detect an out of sync server. If the answer is 'reset its time', then you might as well automate that.)

A NTP daemon is usually not the best way to achieve this. NTP daemons are normally biased toward being cautious about trusting upstream time sources and prefer to change the system clock slowly, without abrupt jumps; this famously leads to various problems if your system winds up with its clock significantly out (some NTP daemons have historically given up entirely in that case). Even once you've configured your NTP daemon to not have these problems, you still need to worry about what happens if the daemon dies or stops doing anything.

(The normal biases of NTP daemons make sense in an environment where you're talking to a random collection of time sources outside of your control, some of which may be broken or even vaguely malicious.)

Modern servers in good operating condition in a decent environment don't have their time drift very much over the course of an hour (our typical adjustment is under a millisecond). Cron is reliable (and if it dies you have bigger problems than time synchronization) and it's straightforward to write a little script that force-sets the server's time from a local NTP server (your OS may already come with one). If you're worried about the NTP server being a single point of failure, run two or three. You're still going to want to monitor the health and time synchronization of your NTP server (or servers), but at least you only have a few of them.

There are situations where you need better time than this and you understand why (and how it has to be better). That's when you turn to running a NTP daemon on every server involved (among other things, like carefully considering where you're ultimately getting your NTP time from). Not before then.

Comments on this page:

Heretic! :-) You miss the joy of having ntp offset graphs for all your systems :-)

A cronjob every hour is a fine option, but we've got a couple of cases where we wanted more.

Our mail system comprises of about 50 servers in various roles interacting with each other and other internet mail servers. We don't want them to be pretty close with each other but a big offset away from what other mail servers believe is the correct time. We want each and every one of them to be ntp synced when talking to any other server. We want all Received: lines to be correct in the mail headers.

Knowing that all our servers are ntp synced gives me the pleasure to suggest "Use telnet mailgate.forthnet.gr 25 to find the time difference between us." when I contact someone else regarding say an abuse incident, to calculate timezone differences in logs.

We've also got some systems running on a Microsoft Hyper-V cluster which is terrible beyond any imagination in providing time sync services to linux systems. hwclock and system time drifted apart more than 3 minutes in less than 60 seconds. A cronjob running ntpdate every minute on those systems couldn't keep up with the actual drift on the system. ntpd is the only option to save those poor VMs running there (with tinker panic 0).

Jumping back in time is something many daemons don't handle well. They should, but they don't. Until recently, many Go programs couldn't handle that because Go didn't expose the monotonic clock. Internally, Go was using it but any external use of a clock to compute delta could lead to a negative value, a edge case rarely tested. This could lead to an incorrect stat or to a crash. See the Cloudflare bug.

Moreover, the Linux kernel keeps a tab to know if time is synchronized with a time daemon. This is exposed to userland and some program may wait for the time to be trustworthy to do something. The only example on the top of my head is hwclock which is not that important (it won't save time if kernel says time is not synchronized).

From at 2017-11-16 07:50:24:

Instead you should have almost all of your machines set their time from a trusted upstream source on boot and every so often afterward (once an hour is often convenient).

Indeed, and you're pretty much describing an SNTP client such as systemd-timesyncd.

It is better that they all be wrong together than some of them be right and others be wrong

But hardware clocks can vary wildly in accuracy so I don’t see how relying on those helps you keep your fleet closely synchronised. Also, if a machine can be wrong out of step with the fleet because it fail to sync from the time server, why would this apply to ntpd but not a periodic ntpdate?

Basically it seems to me like you’re arguing against the typical default configuration of NTP dæmons of smearing clock changes over time and trying to be intelligent about the server pool, rather than against running an NTP dæmon per se.

I suppose the argument from your point of view is that once you have turned these features off, an NTP dæmon isn’t doing much else than a periodic ntpdate. But without making that point explicitly, you’ll get people like me and my fellow commenters quibbling with you. 😊

By cks at 2017-11-16 22:29:56:

If you're doing periodic synchronization, the big question is not how accurate the system is over the long term but how accurate it is between those synchronizations; that drift is your maximum deviation from true time (and twice it is your worst case deviation from each other). My belief and evidence so far is that drift is low for reasonably decent servers in our environment, and probably for reasonably decent servers in general.

The traditional 'out of sync' problem with ntpd is not that a machine's time is off because it failed to sync to the time server, but that it's failed to sync with the time server because its initial time is so far off that ntpd didn't consider any time sources as valid. Ntpdate fixes this because you can tell ntpdate 'sync the time to upstream no matter how far off it is'. You can probably tell your NTP daemon to do this too, with the right configuration, but it's likely not the default and it may not be spelled out explicitly in the documentation.

(Traditionally an NTP daemon sees its job as 'maintain good time', which includes distrusting bad sources, while ntpdate and friends have the job of 'set the time to that of an upstream source'.)

If you have to carefully build a NTP daemon configuration to override a collection of defaults, you're going through a lot of effort to get more or less what you can get from ntpdate for free (well, for a little script in /etc/cron.hourly). And you still have the issue of what happens if the NTP daemon dies or stops working right, which can happen.

(There are also potential issues with a daemon if you, say, change the DNS for your time servers so they have different IPs. Will the running NTP daemons automatically re-check DNS and get the new IPs, or will some of them carry on with the old ones and perhaps stop being able to talk to anything? With a cron job, you know that ntpdate is re-resolving the hostnames every time it runs.)

By Anon at 2017-11-25 17:32:45:

What about shutting up bad clients? It's a lot less likely that you can persistently stop an errant SNTP client which is doing its thing from cron. To this day home routers still come hardcoded with particular NTP server names (rather than using a CNAME pool for the router vendor).

Also VMs are notorious for having clocks that drift so they are desirable to discipline regularly.

By cks at 2017-11-25 17:51:00:

Here I'm only considering the case where all the machines involved are on your own network(s) and under your control. So if there's an errant client banging away from cron, you can go and fix it (or perhaps block it at some internal firewall).

I will admit that VMs are a hard problem, especially if you're running OSes that aren't carefully set up to take their time from the VM hypervisor (which stock ones won't be, because hardware clocks are traditionally not accurate enough).

By Anon at 2017-11-26 05:59:25:

Ah understood. There's one more side issue - some one shot time synchronization setups have a habit of looking up the time each time the DHCP lease is renewed (I'm looking at you systemd-timesyncd) . This can lead to unnecessarily frequent clock synchronizations when connecting to DHCP servers with tiny lease times (e.g. five minutes) but there's definitely a convenience to such setups (they correct time the moment the network is up)...

Written on 16 November 2017.
« I've switched from ntpd to chrony as my NTP daemon
When you should run an NTP daemon on your servers »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Nov 16 01:03:41 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.