Unsurprisingly, the clock in your server's IPMI drifts over time
Most servers these days have an IPMI, including more or less all of ours. One of the standard IPMI functions is a 'system event log', which stores 'events' with timestamps, which means that the IPMI needs to have a clock. Some of our IPMIs are both connected to a network and support maintaining their time through NTP, but most of our IPMIs are either disconnected or at least don't support NTP (as far as I know). This means that their clocks are what's called 'free running'. In completely unsurprising news, these clocks drift.
My standard way to read the IPMI time is with '
ipmitool sel time
get', which provides the time with a one second granularity. Because
we were recently dealing with an incident where device logs didn't
have accurate time and weren't entirely helpful as a result, I
decided to add fetching the IPMI time to our system for collecting
IPMI sensor information and
recording it in Prometheus. This gives us some ability to track and
monitor IPMI clock drift over time.
(Because the precision of the IPMI time you get is limited, small differences between accurate time and the IPMI's time are expected. Having to collect this information through a shell script makes it a bit less precise, too.)
Looking at the metrics suggest that our worst IPMI clocks gain or lose around a second a day, while the best ones don't seem to drift appreciably over the course of a week or so so far. If I can resist the temptation to reset IPMI clocks when they drift too far apart, I should be able to get better data by letting things sit for a few weeks.
(The other approach would be to create a once a day cron job that resets the IPMI time if it's drifted too far out of sync.)
The corollary of this is that any isolated clock is likely to drift at least as much as this. If you have network switches or smart PDUs or in general any stand-alone device with logs, maybe check and reset the clock every so often.
Comments on this page: