Remembering that Prometheus expressions act as filters
In conventional languages, comparisons like '
>' and other boolean
operations like '
and' give you implicit or explicit boolean
results. Sometimes this is a pseudo-boolean result; in Python if
you say '
A and B', you famously get either
False or the value
B as the end result (instead of
True). However, PromQL
doesn't work this way. As I keep having to remember over and over,
in Prometheus, comparisons and other boolean operators are
In PromQL, when you write '
some_metric > 10', what happens is
that first Prometheus generates a full instant vector for
some_metric, with all of the metric points and their labels and
their values, and then it filters out any metric point in the instant
vector where the value isn't larger than 10. What you have left is
a smaller instant vector, but all of the values of the metric points
in it are their original ones.
The same thing happens with '
and'. When you write '
and other_metric', the
other_metric is used only as a filter;
metric points from
some_metric are only included in the result
set if there is the same set of labels in the
instant vector. This means that the values of
irrelevant and do not propagate into the result.
The large scale effect of this is that the values that tend to propagate through your rule expression are whatever started out as the first metric you looked at (or whatever arithmetic you perform on them). Sometimes, especially in alert rules, this can bias you toward putting one condition in front of the other. For instance, suppose that you want to trigger an alert when the one-minute load average is above 20 and the five-minute load average is above 5, and you write the alert rule as:
expr: (node_load5 > 5) and (node_load1 > 20)
The value available in the alert rule and your alert messages is
the value of
is what you started out the rule with. If you find the value of
node_load1 more useful in your alert messages, you'll want to
flip the order of these two clauses around.
As the PromQL documentation covers, you can turn comparison operations
from filters into pseudo-booleans by using '
bool', as in
some_metric > bool 10'. As far as I know, there is no way to
do this with '
and', which always functions as a filter, although
you can at least select what labels have to match (or what labels
PS: For some reason I keep forgetting that '
unless' can use '
on' and '
ignoring' to select what labels
you care about. What you can't do with them, though, is propagate
some labels from the right side into the result; if you need that,
you have to use '
group_left' or '
group_right' and figure
out how to re-frame your operation so that it involves a comparison,
and' and company don't work with grouping.
(I was going to confidently write an entry echoing something that I said on the Prometheus users mailing list recently, but when when I checked the documentation and performed some tests, it turned out I was wrong about an important aspect of it. So this entry is rather smaller in scope, and is written mostly to get this straight in my head since I keep forgetting the details of it.)
WireGuard was pleasantly easy to get working behind a NAT (or several)
Normally, my home machine is directly connected to the public Internet by its DSL connection. However, every so often this DSL connection falls over, and these days my backup method of Internet connectivity is that I tether my home machine through my phone. This tethering gives me an indirect Internet connection; my desktop is on a little private network provided by my phone and then my phone NAT's my outgoing traffic. Probably my cellular provider adds another level of NAT as well, and certainly the public IP address that all of my traffic appears from can hop around between random IPs and random networks.
Most of the time this works well enough for basic web browsing and even SSH sessions, but it has two problems when I'm connecting to things at work. The first is that my public IP address can change even while I have a SSH connection present (but perhaps not active enough), which naturally breaks the SSH connection. The second is that I only have 'outside' access to our servers; I can only SSH to or otherwise access machines that are accessible from the Internet, which excludes most of the interesting and important ones.
Up until recently I've just lived with this, because the whole issue just doesn't come up often enough to get me to do anything about it. Then this morning my home DSL connection died at a fairly inopportune time, when I was scheduled to do something from home that involved both access to internal machines and things that very much shouldn't risk having my SSH sessions cut off in mid-flight (and that I couldn't feasibly do from within a screen session, because it involved multiple windows). I emailed a co-worker to have them take over, which they fortunately were able to do, and then I decided to spend a little time to see if I could get my normal WireGuard tunnel up and running over my tethered and NAT'd phone connection, instead of its usual DSL setup. If I could bring up my WireGuard tunnel, I'd have both a stable IP for SSH sessions and access to our internal systems even when I had to use my fallback Internet option.
(I won't necessarily have uninterrupted SSH sessions, because if my phone changed public IPs there will be a pause as WireGuard re-connected and so on. But at least I'll have the chance to have sessions continue afterward, instead of being intrinsically broken.)
Well, the good news is that my WireGuard setup basically just worked as-is when I brought it up behind however many layers of NAT'ing are going on. The actual WireGuard configuration needed no changes and I only had to do some minor tinkering with my setup for policy-based routing (and one of the issues was my own fault). It was sufficiently easy that now I feel a bit silly for having not tried it before now.
(Things would not have been so easy if I'd decided to restrict what IP addresses could talk to WireGuard on my work machine, as I once considered doing.)
This is of course how WireGuard is supposed to work. Provided that you can pass its UDP traffic in both ways (which fortunately seems to work through the NAT'ing involved in my case), WireGuard doesn't care where your traffic comes from if it has the right keys, and your server will automatically update its idea of what (external) IP your client has right now when it gets new traffic, which makes everything work out.
(WireGuard is actually symmetric; either end will update its idea of the other end's IP when it gets appropriate traffic. It's just that under most circumstances your server end rarely changes its outgoing IP.)
I knew that in theory all of this should work, but it's still nice to have it actually work out in practice, especially in a situation with at least one level of NAT going on. I'm actually a little bit amazed that it does work through all of the NAT magic going on, especially since WireGuard is just UDP packets flying back and forth instead of a TCP connection (which any NAT had better be able to handle).
On a side note, although I did everything by hand this morning, in
theory I could automate all of this through
dhclient hook scripts, which I'm
already using to manage my resolv.conf (as covered in this entry). Of course this brings up a little issue,
because if the WireGuard tunnel is up and working I actually want
to use my regular resolv.conf instead of the one I switch to when
I'm tethering (without WireGuard). Probably I'm going to defer all
of this until the next time my DSL connection goes down.