Wandering Thoughts archives

2019-04-13

Remembering that Prometheus expressions act as filters

In conventional languages, comparisons like '>' and other boolean operations like 'and' give you implicit or explicit boolean results. Sometimes this is a pseudo-boolean result; in Python if you say 'A and B', you famously get either False or the value of B as the end result (instead of True). However, PromQL doesn't work this way. As I keep having to remember over and over, in Prometheus, comparisons and other boolean operators are filters.

In PromQL, when you write 'some_metric > 10', what happens is that first Prometheus generates a full instant vector for some_metric, with all of the metric points and their labels and their values, and then it filters out any metric point in the instant vector where the value isn't larger than 10. What you have left is a smaller instant vector, but all of the values of the metric points in it are their original ones.

The same thing happens with 'and'. When you write 'some_metric and other_metric', the other_metric is used only as a filter; metric points from some_metric are only included in the result set if there is the same set of labels in the other_metric instant vector. This means that the values of other_metric are irrelevant and do not propagate into the result.

The large scale effect of this is that the values that tend to propagate through your rule expression are whatever started out as the first metric you looked at (or whatever arithmetic you perform on them). Sometimes, especially in alert rules, this can bias you toward putting one condition in front of the other. For instance, suppose that you want to trigger an alert when the one-minute load average is above 20 and the five-minute load average is above 5, and you write the alert rule as:

expr: (node_load5 > 5) and (node_load1 > 20)

The value available in the alert rule and your alert messages is the value of node_load5, not node_load1, because node_load5 is what you started out the rule with. If you find the value of node_load1 more useful in your alert messages, you'll want to flip the order of these two clauses around.

As the PromQL documentation covers, you can turn comparison operations from filters into pseudo-booleans by using 'bool', as in 'some_metric > bool 10'. As far as I know, there is no way to do this with 'and', which always functions as a filter, although you can at least select what labels have to match (or what labels to ignore).

PS: For some reason I keep forgetting that 'and', 'or', and 'unless' can use 'on' and 'ignoring' to select what labels you care about. What you can't do with them, though, is propagate some labels from the right side into the result; if you need that, you have to use 'group_left' or 'group_right' and figure out how to re-frame your operation so that it involves a comparison, since 'and' and company don't work with grouping.

(I was going to confidently write an entry echoing something that I said on the Prometheus users mailing list recently, but when when I checked the documentation and performed some tests, it turned out I was wrong about an important aspect of it. So this entry is rather smaller in scope, and is written mostly to get this straight in my head since I keep forgetting the details of it.)

sysadmin/PrometheusExpressionsFilter written at 23:59:31; Add Comment

WireGuard was pleasantly easy to get working behind a NAT (or several)

Normally, my home machine is directly connected to the public Internet by its DSL connection. However, every so often this DSL connection falls over, and these days my backup method of Internet connectivity is that I tether my home machine through my phone. This tethering gives me an indirect Internet connection; my desktop is on a little private network provided by my phone and then my phone NAT's my outgoing traffic. Probably my cellular provider adds another level of NAT as well, and certainly the public IP address that all of my traffic appears from can hop around between random IPs and random networks.

Most of the time this works well enough for basic web browsing and even SSH sessions, but it has two problems when I'm connecting to things at work. The first is that my public IP address can change even while I have a SSH connection present (but perhaps not active enough), which naturally breaks the SSH connection. The second is that I only have 'outside' access to our servers; I can only SSH to or otherwise access machines that are accessible from the Internet, which excludes most of the interesting and important ones.

Up until recently I've just lived with this, because the whole issue just doesn't come up often enough to get me to do anything about it. Then this morning my home DSL connection died at a fairly inopportune time, when I was scheduled to do something from home that involved both access to internal machines and things that very much shouldn't risk having my SSH sessions cut off in mid-flight (and that I couldn't feasibly do from within a screen session, because it involved multiple windows). I emailed a co-worker to have them take over, which they fortunately were able to do, and then I decided to spend a little time to see if I could get my normal WireGuard tunnel up and running over my tethered and NAT'd phone connection, instead of its usual DSL setup. If I could bring up my WireGuard tunnel, I'd have both a stable IP for SSH sessions and access to our internal systems even when I had to use my fallback Internet option.

(I won't necessarily have uninterrupted SSH sessions, because if my phone changed public IPs there will be a pause as WireGuard re-connected and so on. But at least I'll have the chance to have sessions continue afterward, instead of being intrinsically broken.)

Well, the good news is that my WireGuard setup basically just worked as-is when I brought it up behind however many layers of NAT'ing are going on. The actual WireGuard configuration needed no changes and I only had to do some minor tinkering with my setup for policy-based routing (and one of the issues was my own fault). It was sufficiently easy that now I feel a bit silly for having not tried it before now.

(Things would not have been so easy if I'd decided to restrict what IP addresses could talk to WireGuard on my work machine, as I once considered doing.)

This is of course how WireGuard is supposed to work. Provided that you can pass its UDP traffic in both ways (which fortunately seems to work through the NAT'ing involved in my case), WireGuard doesn't care where your traffic comes from if it has the right keys, and your server will automatically update its idea of what (external) IP your client has right now when it gets new traffic, which makes everything work out.

(WireGuard is actually symmetric; either end will update its idea of the other end's IP when it gets appropriate traffic. It's just that under most circumstances your server end rarely changes its outgoing IP.)

I knew that in theory all of this should work, but it's still nice to have it actually work out in practice, especially in a situation with at least one level of NAT going on. I'm actually a little bit amazed that it does work through all of the NAT magic going on, especially since WireGuard is just UDP packets flying back and forth instead of a TCP connection (which any NAT had better be able to handle).

On a side note, although I did everything by hand this morning, in theory I could automate all of this through dhclient hook scripts, which I'm already using to manage my resolv.conf (as covered in this entry). Of course this brings up a little issue, because if the WireGuard tunnel is up and working I actually want to use my regular resolv.conf instead of the one I switch to when I'm tethering (without WireGuard). Probably I'm going to defer all of this until the next time my DSL connection goes down.

linux/WireGuardBehindNAT written at 00:16:23; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.