2020-10-31
A gotcha with combining single-label and multi-label Prometheus metrics
Suppose that you have two metrics for roughly the same thing, but one metric is unlabeled and the other metric has meaningful labels to distinguish sub-categories. For example, suppose that you have a count of the spam messages rejected by one anti-spam system, which is not broken down by spam level, and then a count of spam messages rejected by another system that does break them down by spam level. Now you want to present a dashboard panel that displays the combined number of spam messages rejected over a time range. So you write the obvious looking PromQL query:
increase( pmx_rejects[$__range] ) + sum( increase( rspamd_rejects[$__range] ) )
(Here pmx_rejects
is a single metric and rspamd_rejects
is
labeled with the spam confidence level, hence the sum()
.)
When you put this into your dashboard panel, your dashboard panel is surprisingly blank and you are sad, and perhaps puzzled (as I was).
What is going on here is that sum()
normally returns a label-less
result because it aggregates multiple time series together, while
increase()
preserves labels since it doesn't aggregate. This means
that the labels don't match between the two sides of the +
, which
means you get no results since PromQL math operators on vectors
are filters (just like boolean operators).
The straightforward way to deal with this is to get rid of the labels
from the plain increase()
side, and the simple way to do this is to
use sum()
on it even though there is only one time series:
sum( increase( pmx_rejects[$__range] ) ) + sum( increase( rspamd_rejects[$__range] ) )
As it happens, there is another way to write this, but I strongly don't
recommend it, because it's a hack and I think is probably not officially
supported by the PromQL specification (such as it is). The alternate
way is to replace the '+
' with '+ on(nosuchlabel)
'. What I think
the on()
does is throw out all labels except the nonexistent label,
and then Prometheus concludes that because both sides have no remaining
labels they can be added together.
(Prometheus will probably never change this behavior just because
it would be a language incompatibility and would probably bite
people, even if it was never explicitly officially documented as
working. Before I tried it, my expectation for '+ on(nosuchlabel)
'
was that it would discard time series without the label, which in
this case would discard all time series. The current behavior does
make sense if you think about it, but it is somewhat of a trap.)
Some settings you want to make to CyberPower's UPS Powerpanel daemon
I have a CyberPower UPS and a while back I installed their
PowerPanel
software to talk to it, in large part to get various status
information in an automated way. The
other day I discovered that it has some undesirable default
settings. So here are
some notes on things that you will almost certainly want to change
in /etc/pwrstatd.conf
if you're using PowerPanel too.
As I discovered, the daemon has at least two undesirable behaviors. It will power off your Linux system one minute after a power failure and then program the UPS to shut down all power ten minutes later, even if line power comes back in the mean time (if line power has come back, it at least turns back on ten seconds later). To disable shutdowns on power failure, the options I am using are:
powerfail-active = no powerfail-shutdown = no
I suspect that the latter option is sufficient by itself, but the
provided script doesn't do very much and I went for overkill. I
think these can be set with 'pwrstat -pwrfail -shutdown off
' and
'pwrstat -pwrfail -active off
', but I edited the configuration
file directly (partly because I was already looking at it to find
things). There are similar settings for what to do when the battery
gets low.
Programming the UPS to power off may only happen if you allow shutdowns on power failure (or general scripts), but I believe that you can specifically turn it off with some additional options:
# The UPS will turn power off when this time # is expired. shutdown-sustain = 0 turn-ups-off = no
The former may possibly be set with 'pwrstat -pwrfail -duration
0
', but I'm not sure; the pwrstat
help text is not clear and I'm
not inclined to experiment. I don't think turn-ups-off
can be
changed through pwrstat
. According to the help text comments in
pwrstatd.conf
, it looks like the equivalent of shutdown-sustain
for low battery is the runtime-threshold
setting, which also
controls how little remaining runtime there is before your system
triggers the script and starts shutting down.
My view on automated shutdowns is that if I'm in front of the computer during a power failure, I'm perfectly competent to shut it down when I determine that I need or want to do so. If I'm not in front of the computer, it can quietly run down the battery on its own in the hopes that the outage ends before the battery is exhausted.
As a side note, if you actually have an automated shutdown of your system when the UPS is running low, it's worth thinking about how the system is going to come back up when power returns. PowerPanel's 'have the UPS turn off power (then turn it back on)' has the advantage that it will restart machines that have been shut down, no matter what else happens, provided only that your BIOS is set to always power up when AC power appears (instead of being set to 'last state, whatever that was').