We're broadly switching to synchronizing time with systemd's timesyncd
Every so often, simply writing an entry causes me to take a closer look
at something I hadn't paid much attention to before. I recently wrote
a series of entries on my switch from ntpd to chrony on my desktops and why we don't run NTP daemons but instead
synchronize time through a cron entry.
Our hourly crontab script for time synchronization dates back to at
least 2008 and perhaps as early as 2006 and our first Ubuntu 6.06
installs; we've been carrying it forward ever since without thinking
about it very much. In particular, we carried it forward into our
standard 16.04 installs. When we did this,
we didn't really pay attention to the fact that 16.04 is different
here, because 16.04 is systemd based and includes systemd's timesyncd
time synchronization system. Ubuntu installed and activated
systemd-timesyncd (with a stock setup that got time from
ntp.ubuntu.com), we installed our hourly crontab script, and nothing
exploded so we didn't really pay attention to any of this.
When I wrote my entries, they caused me to start actually noticing
systemd-timesyncd and paying some attention to it, which included
noticing that it was actually running and synchronizing the time
on our servers (which kind of invalidates my casual claim here that our servers were typically less than
a millisecond out in an hour, since that was based on
reports and I was assuming that there was no other time synchronization
going on). Coincidentally, one of my co-workers had also had timesyncd
come to his attention recently for reasons outside of the scope of
this entry. With timesyncd temporarily in our awareness, my
co-workers and I talked over the whole issue and decided that doing
time synchronization the official 16.04 systemd way made the most
(Part of it is that we're likely to run into this issue on all future Linuxes we deal with, because systemd is everywhere. CentOS 7 appears to be just a bit too old to have timesyncd, but a future CentOS 8 very likely will, and of course Ubuntu 18.04 will and so on. We could fight city hall, but at a certain point it's less effort to go with the flow.)
In other words, we're switching over to officially using systemd-timesyncd. We were passively using it before without really realizing it since we didn't disable timesyncd, but now we're actively configuring it to use our time local servers instead of Ubuntu's and we're disabling and removing our hourly cron job. I guess we're now running NTP daemons on all our servers after all; not because we need them for any of the reasons I listed, but just because it's the easiest way.
(At the moment we're also using
/etc/default/ntpdate (from the Ubuntu
ntpdate package) to force an initial synchronization at boot time,
or technically when the interface comes up. We'll probably keep doing
this unless timesyncd picks up good explicit support for initially
force-setting the system time; when our machines boot and get on the
network, we want them to immediately jump their time to whatever we
currently think it is.)
The cost of memory access across a NUMA machine can (probably) matter
We recently had an interesting performance issue reported to us by a researcher here. We have a number of compute machines, none of them terribly recent; some of them are general access and some of them can be booked for exclusive usage. The researcher had a single-core job (I believe using R) that used 50 GB or more of RAM. They first did some computing on a general-access compute server with Xeon E5-2680s and 96 GB of RAM, then booked one of our other servers with Xeon X6550s and 256 GB of RAM to do more work on (possibly work that consumed significantly more RAM). Unfortunately they discovered that the server they'd booked was massively slower for their job, despite having much more memory.
We don't know for sure what was going on, but our leading theory is NUMA memory access effects because the two servers have significantly different NUMA memory hierarchies. In fact they are the two example servers from my entry on getting NUMA information from Linux. The general access server had two sockets for 48 GB of RAM per socket, while the bookable compute server with 256 GB of RAM had eight sockets and so only 32 GB of RAM per socket. To add to the pain, the high-memory server also appears to have a higher relative cost for access to the memory of almost all of the other sockets. So on the 256 GB machine, memory access was likely going to other NUMA nodes significantly more frequently and then being slower to boot.
Having said that, I just investigated and there's another difference; the 96 GB machine has DDR3 1600 MHz RAM, while the 256 GB machine has DDR3 RAM at 1333 Mhz (yes, they're old machines). This may well have contributed to any RAM-related slowdown and makes me glad that I checked; I don't usually even consider RAM module speeds, but if we think there's a RAM-related performance issue it's another thing to consider.
I found the whole experience to be interesting because it pointed out a blind spot in my usual thinking. Before the issue came up, I just assumed that a machine with more memory and more CPUs would be better, and if it wasn't better it would be because of CPU issues (here they're apparently generally comparable). That NUMA layout (and perhaps RAM speed) made the 'big' machine substantially worse was a surprise. I'm going to have to remember this for the future.
PS: The good news is that we had another two-socket E5-2680 machine with 256 GB that the researcher could use, and I believe they're happy with its performance. And with 128 GB of RAM per socket, they can fit even quite large R processes into a single socket's memory.