Chris's Wiki :: blog/sysadmin/PrometheusMissingMetricsWish Commentshttps://utcc.utoronto.ca/~cks/space/blog/sysadmin/PrometheusMissingMetricsWish?atomcommentsDWiki2021-04-01T05:58:49ZRecent comments in Chris's Wiki :: blog/sysadmin/PrometheusMissingMetricsWish.By roidelapluie on /blog/sysadmin/PrometheusMissingMetricsWishtag:CSpace:blog/sysadmin/PrometheusMissingMetricsWish:50910cdf2b9cf181895837c8707b55bee126e61froidelapluie<div class="wikitext"><p>Regarding your request about changes_up/changes_down: while we do not have changes_up, resets() is the equivalent of changes_down().</p>
</div>2021-04-01T05:58:49ZBy roidelapluie on /blog/sysadmin/PrometheusMissingMetricsWishtag:CSpace:blog/sysadmin/PrometheusMissingMetricsWish:5df12ddaa855958a04dba42e44b9b6c8a5395ba5roidelapluie<div class="wikitext"><p>Yeah in this case it seems the "correct" answer is with subqueries as you expected.</p>
<pre>
changes((ALERTS_FOR_STATE and ignoring (alertstate) ALERTS{alertstate="firing"})[1h:])+1
</pre>
</div>2021-03-31T22:42:54ZBy Chris Siebenmann on /blog/sysadmin/PrometheusMissingMetricsWishtag:CSpace:blog/sysadmin/PrometheusMissingMetricsWish:d18dca29df4f46d7ef5e782f1154ded0fa222b03Chris Siebenmann<div class="wikitext"><p>This is a better attempt than I expected, but unfortunately it doesn't
work either. This will tell you how many alerts started to trigger over a
time interval, but it won't tell you how many actually fired because the
<code>ALERTS_FOR_STATE</code> metric doesn't have a label for whether the alert
is pending or firing.</p>
<p>I hadn't realized that <code>changes()</code> worked quite this way for time series
with gaps in them, and it's a useful thing to know. It's possible this
will change what sort of metrics I generate for some things, since it's
clearly useful to have different values for anything I want to count.</p>
</div>2021-03-31T14:54:59ZBy roidelapluie on /blog/sysadmin/PrometheusMissingMetricsWishtag:CSpace:blog/sysadmin/PrometheusMissingMetricsWish:4346a74792cfc0e0618dcb88464d0e5d45a6ffb3roidelapluie<div class="wikitext"><p>Hello,</p>
<p>However, I think it's worth finding a solution. It feels frankly embarrassing that Prometheus currently cannot answer basic questions like 'how many times did this alert fire over this time interval'.</p>
<p>You can answer this question with the following query:</p>
<blockquote><p>changes(ALERTS_FOR_STATE[1h])+1</p>
</blockquote>
</div>2021-03-31T14:28:52ZBy dozzie on /blog/sysadmin/PrometheusMissingMetricsWishtag:CSpace:blog/sysadmin/PrometheusMissingMetricsWish:4fec117d25d87c9edf5690bb252298531224675adozzie<div class="wikitext"><p>Of course, logs and metrics can be quite similar in a bunch of aspects,
otherwise it wouldn't be as easy to shoehorn one into storage for the other.
Yet, they are still separate data types, and unless the storage was prepared
specifically for the given type, it will only work with it so-so, as you
noticed yourself.</p>
</div>2021-03-15T15:48:07ZBy Chris Siebenmann on /blog/sysadmin/PrometheusMissingMetricsWishtag:CSpace:blog/sysadmin/PrometheusMissingMetricsWish:c98471d42a84d4377199ed4f46d81e15388d190bChris Siebenmann<div class="wikitext"><p>I picked alerts because they show up in Prometheus' metrics storage,
primarily as the ALERTS metric. In addition many of the questions you want
to ask about them are what I consider metrics-like instead of logs-like.</p>
<p>(This is true of a lot of logs in general; we ask a lot of 'how many
times' and 'what different <X>s do we see in' and so on questions about
many different sorts of logs.)</p>
</div>2021-03-14T22:10:45ZBy dozzie on /blog/sysadmin/PrometheusMissingMetricsWishtag:CSpace:blog/sysadmin/PrometheusMissingMetricsWish:5d2c868e2a5349c1a1c94cbe635b92b31215c459dozzie<div class="wikitext"><blockquote><p>You can probably get averages over time, but it's at least pretty difficult to get something as simple as a count of how many times an alert fired within a given time interval.</p>
</blockquote>
<p>Because it's a different type of data. Alerts firing are not metrics, but
events (just like logs), and Prometheus is a metrics storage. It's really not
a surprise it's ill-suited for other types of data.</p>
</div>2021-03-14T17:52:20Z