Wandering Thoughts archives

2020-12-15

How to make Grafana properly display a Unix timestamp

There are many things in Prometheus that generate a 'time' value that is in seconds since the Unix epoch of January 1st 1970 00:00 UTC. One such example, relevant to yesterday's entry on the time when Prometheus metrics happened, is the result of the Prometheus timestamp() function. When you're working with such times, it's common to want to display them in a Grafana table (perhaps a multi-value table), and one natural way to display them is simply as, well, the date and time they represent.

Sadly, Grafana makes this unnecessarily annoying, to the extent that every time I've needed to do this I've had to do Internet searches to find out how (again). If you edit a Grafana table, you can go to Column Styles and set the Type and Unit; the Type offers a 'Date', and for the Number type the Units has a 'Date & Time' section that offers options like 'Datetime local'. If you set either of these options on your Unix epoch timestamp, you will get a displayed result that is generally some time in 1970, and you will be unhappy and angry at Grafana.

The secret is that Grafana expects Unix timestamps in milliseconds, not seconds. Prometheus and many other things generate timestamps in seconds, not milliseconds; if you just pass these timestamps straight to Grafana and have it display them as-is, they wind up being interpreted as millisecond-based times that are way too small. To make things work right, you must multiply all Unix timestamps by 1000 before having Grafana display them. When you're using data from Prometheus, this has to be done in the Prometheus query itself in the Grafana dashboard, which will leave you with mysterious '* 1000' bits in your queries.

Once you're turning your Unix timestamps from seconds to milliseconds, either the 'Date' column type or one of the 'Date & Time' units will display your timestamp. Right now I'm using the 'Datetime local' option of the latter because it seems less magical than assuming that Grafana's 'Date' type will always do what I want it to.

Sadly, this particular peculiar decision is now baked into Grafana in order to retain backward compatibility with current dashboards and panels. Grafana would have to introduce a new 'Date (seconds)' or 'Datetime local (seconds)' to do it right, and I hope that they do someday. Or introduce unit conversions in panel transformations.

(This is one of those entries that I write for myself so that I can find this information the next time I need it, because I'm sure I'm going to need it again. Grafana's behavior here is so counterintuitive and so opaque it's impressive. They don't even hint at the underlying units in the names, where they could do things like 'Datetime local (ms)'.)

sysadmin/GrafanUnixEpochTime written at 21:07:42; Add Comment

In Prometheus, it's hard to work with when metric points happened

I generally like Prometheus and PromQL, its query language (which you have to do basically everything through). But there is one frustrating exception that I routinely run into, and that is pretty much anything to do with when a metric point happened.

PromQL is great about letting you deal with the value of metrics. But the value is only half of a metric, because each sample (what I usually call a metric point) are a value plus a timestamp (cf). Prometheus stores the timestamp as well as the value, but it has relatively little support for accessing and manipulating the timestamps, much less than it does for the values. However there are plenty of times when the timestamp is at least as interesting as the value. One example is finding out the most recent time that a metric existed, which is useful for all sorts of reasons.

When I started writing this entry (and when I wrote those tweets), I was going to say that Prometheus makes it unnecessarily hard and should fix the problem. But the more I think about it, that's only true for simple cases. In more complex cases you need to conditionally select metric points by their value, and at that point Prometheus has no simple solution. In the current state of Prometheus, any 'evaluation over time' that needs something more sophisticated than matching labels must use Prometheus subqueries, and that holds whether it's using the value of the metric point or the time it happened at.

Prometheus makes the simple case slightly harder because timestamp() is a function, which means that anything using it must be written as a subquery. But that only hurts you with relatively simple queries, like finding the most recent time a metric existed:

max_over_time( timestamp( vpn_users{user="fred"} )[4w:] )

Even without this you'd have to put the '[4w]' somewhere (although having it as a subquery may be slightly less efficient in some cases).

Now suppose that you want to know the most recent time that pings failed for each of your Blackbox ICMP probes. This requires a conditional check, but the query can be almost the same as before through the magic of Prometheus operators being set operations:

max_over_time( timestamp( probe_success{probe="icmp"} == 0 )[24h:] )

What makes some cases of dealing with values over time in PromQL simpler (for example working how frequently pings fail) isn't any feature in PromQL itself, it's that those metric values are carefully arranged so that you can do useful things with aggregation over time. Timestamps are fundamentally not amenable to this sort of manipulation, but they're far from alone; many metrics have values that you need to filter or process.

(You can get the earliest time instead of the latest time with min_over_time. Other aggregation over time operators are probably not all that useful with timestamps.)

sysadmin/PrometheusMetricTimeHard written at 00:12:22; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.