Wandering Thoughts archives


A gotcha with combining single-label and multi-label Prometheus metrics

Suppose that you have two metrics for roughly the same thing, but one metric is unlabeled and the other metric has meaningful labels to distinguish sub-categories. For example, suppose that you have a count of the spam messages rejected by one anti-spam system, which is not broken down by spam level, and then a count of spam messages rejected by another system that does break them down by spam level. Now you want to present a dashboard panel that displays the combined number of spam messages rejected over a time range. So you write the obvious looking PromQL query:

increase( pmx_rejects[$__range] ) +
   sum( increase( rspamd_rejects[$__range] ) )

(Here pmx_rejects is a single metric and rspamd_rejects is labeled with the spam confidence level, hence the sum().)

When you put this into your dashboard panel, your dashboard panel is surprisingly blank and you are sad, and perhaps puzzled (as I was).

What is going on here is that sum() normally returns a label-less result because it aggregates multiple time series together, while increase() preserves labels since it doesn't aggregate. This means that the labels don't match between the two sides of the +, which means you get no results since PromQL math operators on vectors are filters (just like boolean operators).

The straightforward way to deal with this is to get rid of the labels from the plain increase() side, and the simple way to do this is to use sum() on it even though there is only one time series:

sum( increase( pmx_rejects[$__range] ) ) +
sum( increase( rspamd_rejects[$__range] ) )

As it happens, there is another way to write this, but I strongly don't recommend it, because it's a hack and I think is probably not officially supported by the PromQL specification (such as it is). The alternate way is to replace the '+' with '+ on(nosuchlabel)'. What I think the on() does is throw out all labels except the nonexistent label, and then Prometheus concludes that because both sides have no remaining labels they can be added together.

(Prometheus will probably never change this behavior just because it would be a language incompatibility and would probably bite people, even if it was never explicitly officially documented as working. Before I tried it, my expectation for '+ on(nosuchlabel)' was that it would discard time series without the label, which in this case would discard all time series. The current behavior does make sense if you think about it, but it is somewhat of a trap.)

sysadmin/PrometheusSingleMultiLabelMixing written at 22:30:38; Add Comment

Some settings you want to make to CyberPower's UPS Powerpanel daemon

I have a CyberPower UPS and a while back I installed their PowerPanel software to talk to it, in large part to get various status information in an automated way. The other day I discovered that it has some undesirable default settings. So here are some notes on things that you will almost certainly want to change in /etc/pwrstatd.conf if you're using PowerPanel too.

As I discovered, the daemon has at least two undesirable behaviors. It will power off your Linux system one minute after a power failure and then program the UPS to shut down all power ten minutes later, even if line power comes back in the mean time (if line power has come back, it at least turns back on ten seconds later). To disable shutdowns on power failure, the options I am using are:

powerfail-active = no
powerfail-shutdown = no

I suspect that the latter option is sufficient by itself, but the provided script doesn't do very much and I went for overkill. I think these can be set with 'pwrstat -pwrfail -shutdown off' and 'pwrstat -pwrfail -active off', but I edited the configuration file directly (partly because I was already looking at it to find things). There are similar settings for what to do when the battery gets low.

Programming the UPS to power off may only happen if you allow shutdowns on power failure (or general scripts), but I believe that you can specifically turn it off with some additional options:

# The UPS will turn power off when this time
# is expired.
shutdown-sustain = 0

turn-ups-off = no

The former may possibly be set with 'pwrstat -pwrfail -duration 0', but I'm not sure; the pwrstat help text is not clear and I'm not inclined to experiment. I don't think turn-ups-off can be changed through pwrstat. According to the help text comments in pwrstatd.conf, it looks like the equivalent of shutdown-sustain for low battery is the runtime-threshold setting, which also controls how little remaining runtime there is before your system triggers the script and starts shutting down.

My view on automated shutdowns is that if I'm in front of the computer during a power failure, I'm perfectly competent to shut it down when I determine that I need or want to do so. If I'm not in front of the computer, it can quietly run down the battery on its own in the hopes that the outage ends before the battery is exhausted.

As a side note, if you actually have an automated shutdown of your system when the UPS is running low, it's worth thinking about how the system is going to come back up when power returns. PowerPanel's 'have the UPS turn off power (then turn it back on)' has the advantage that it will restart machines that have been shut down, no matter what else happens, provided only that your BIOS is set to always power up when AC power appears (instead of being set to 'last state, whatever that was').

linux/CyberPowerPowerpanelSettings written at 01:14:57; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.