== Counting the number of distinct labels in a Prometheus metric Suppose, [[not hypothetically SelectingUsefulMetrics]], that you're collecting [[Prometheus https://prometheus.io/]] metrics on your several VPN servers, including a per user count of sessions on each server. The resulting metric looks like this: .pn prewrap on > vpn_user_sessions{ user="cks", server="vpn1", ... } 1 > vpn_user_sessions{ user="fred", server="vpn1", ... } 1 > vpn_user_sessions{ user="cks", server="vpn2", ... } 1 We would like to know how many different users are currently connected across our entire collection of VPN servers. As we see here, the same user may be connected to multiple VPN servers for whatever reason, including that different devices prefer to use different VPN software (such as L2TP or OpenVPN). In Prometheus terms, we want to count the number of distinct label values in ((vpn_user_sessions)) for the '_user_' label, which I will shorten to *the number of distinct labels*. To do this, our first step is to somehow reduce this down to something with one metric point per user, with no other labels. Throwing away labels is done with the '_by (...)_' modifier to [[PromQL aggregation operators https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators]]. For our purposes we can use any of the straightforward operators such as _sum_, _min_, or _max_; I'll use _sum_. Using '_sum(...) by (user)_' will produce a series like this: > { user="cks" } 2 > { user="fred" } 1 Having generated this new vector, we simply count how many elements are in it with _count()_. The final expression is: > count( sum( vpn_user_sessions ) by (user) ) This will give us the number of different users that are connected right now. Next, suppose that we want to know how many different users have used our VPNs over some span of time, such as the past day. To do this in the most straightforward way, we'll start by basically aggregating our time spam down to something that has an element (with a full set of labels) if the user was connected to a particular VPN server at some point in the time span. Since we don't care about the values, we can use any reasonable [[``_over_time'' function https://prometheus.io/docs/prometheus/latest/querying/functions/#aggregation_over_time]], such as 'min': > min_over_time( vpn_user_sessions[24h] ) (The choice of aggregation to use is relatively arbitrary; we're using it to sweep up all of the different sets of labels that have appeared in the last 24 hours, not for its output value. Min does this and is simple to compute.) This gives us an instant vector that we can then process in the same way as we did with ((vpn_user_sessions)) when we generated our number of currently connected users; we aggregate it to get rid of all labels other than '_user_', and then we count how many distinct elements we have. The resulting query is: > count( > sum( > min_over_time( vpn_user_sessions[24h] ) > ) by (user) > ) This is not the only way to create a query that does this, but it's the simplest and probably also the best performing. (I initially wrote a 'how many different users over time' query that didn't produce correct numbers, which I didn't realize until I tested it, and then my next attempt used a subquery and some brute force. It wasn't until I sat down to systematically work out what I wanted and how to get there that I came up with these current versions. This is a valuable learning experience; whenever I'm faced with a complex PromQL query situation, I shouldn't just guess, I should tackle the problem systematically, building up the solution in steps and verifying each one interactively.) PS: It's possible that this trick is either well known or obvious, but if so I couldn't find it in my initial Internet searches before I started flailing around writing my own queries.