Some stuff on 'time since boot' timestamps
From today on Twitter:
@standaloneSA: Is it just me, or does it seem silly that the #NetFlow timestamp field for the flow references "ms since the router booted". Seems obtuse.
@thatcks: @standaloneSA It's probably easy to implement in the router and it creates an absolute ordering w/o worries about time going backwards.
In the Twitter way, this is a little bit cryptic so I'm going to elaborate on my guess here.
Suppose that routers were supposed to generate an absolute timestamp for their events instead of this relative one, for example UTC in milliseconds. This would create two problems.
First, routers would somehow need to know or acquire the correct UTC time (with millisecond resolution) and then maintain it. This is to some degree a solved problem but it adds complexity to the router. It also leads to the second problem, because a router is unlikely to boot with the correct UTC time (down to the millisecond).
The second problem is that the moment you have a system generating an absolute timestamp you need to deal with the certainty that the correct time, as the system sees it, will jump around. The router will boot will some idea of the UTC time but it's quite likely to be a bit off (remember that we're calling for millisecond accuracy here), then over time it will converge on the correct UTC time. As it does so, its version of UTC time may go forward abruptly, go backwards abruptly, or go forward more slowly than UTC time is really advancing. Backwards time jumps screw up event ordering completely, and all of the options screw up the true relative time between events; if you have two events timestamped UTC1 and UTC2, you actually have only a weak idea how long it is between them.
The valuable property that milliseconds since boot has is that it is a clear monotonic timestamp. It only ever goes forward and it goes forward at what should be a very constant rate, which means that it creates a clear order of events and a clear duration between any two events (well, for events from the same stream of monotonic timestamps). Monotonic timestamps are not a substitute for absolute time but neither is absolute time a substitute for monotonic timestamps; you really need both, which means that you need a map between them.
There are two possible places to build such a map: each device can do its own or it can be done in a central aggregator. I believe that the right answer is to do it in the central aggregator because this means that you have only a single version of absolute time, the aggregator's view (each device, aggregator included, may have a slightly different view of the current 'correct' absolute time for the reasons outlined above). Using only a single version of absolute time means that you have a single coherent map of all of the monotonic timestamps to (some) absolute time.
(Of course you need devices that generate monotonic timestamps to tell you when they reset their timestamps, eg when they boot.)
My impression is that using elapsed time since boot is actually common in a number of environments. For example, Linux kernel messages are usually reported this way these days (which has its own issues if you're trying to work backwards to roughly when in absolute time something happened).