Wandering Thoughts archives


Make sure to keep useful labels in your Prometheus alert rules

Suppose, not entirely hypothetically, that you have some metrics that are broken out across categories but what you care about are the total number of things together. For example, you're monitoring some OpenBSD firewalls and you care about the total number of PF states, but your metrics break them down by protocol (this information is available in 'pfctl -ss' output). So your alert rule is going to be something like:

- alert: TooManyStates
  expr: sum( pfctl_protocol_entries ) by (server) > 80000

Congratulations, you may have just aimed a gun at your own foot. If you have additional labels on that pfctl_protocol_entries metric that you may want to use in the alert that will result from this (perhaps the datacenter or some other metadata), you've just lost them. When you said 'sum(...) by (server)', Prometheus faithfully did what you said; it summed everything by the server and as part of that threw away all other labels, because you told it all that mattered was the 'server' label.

There are two ways around this. The obvious, simple way that you may reach for in your haste to fix this issue is to add the additional metadata label or labels that you care about to the 'by()' expression, so you have, eg, 'sum(...) by (server, datacenter)'. The problem with this is that you're playing whack-a-mole, having to add each additional label to the list of labels as you remember them (or discover problems because they're missing). The better way is to be explicit about what you want to ignore:

sum( pfctl_protocol_entries ) without (proto)

This will automatically pass through all other labels, including ones that you add in six months from now as part of a metrics reorganization (long after you forgot that 'sum(..) by (...)' special case in one of your alert rules).

After this experience, I've come to think that doing aggregation using 'by (...)' in your alert rules (or recording rules) is potentially dangerous and ought to at least be scrutinized carefully and probably commented. Sometimes there are good reasons for it where you want to narrow down to a known set of common labels or the like, but otherwise it is a potential trap even if it works for your setup today.

sysadmin/PrometheusKeepLabelsAlerts written at 23:51:33; Add Comment

Sorting out Go's 'for ... = range ..' and when it copies things

I recently read Some tricks and tips for using for range in GoLang, where it said, somewhat in passing:

[...] As explained earlier, when the loop begins, it will copy the original array to a new one and loop through the elements, hence when appending elements to the original array, the copied array actually doesn't change.

My eyebrows went up because I'd forgotten this little bit of Go, and I promptly scuttled off to the official specification to read and understand the details. So here are some notes, because the issues behind this turn out to be more interesting than I expected.

Let's start with the basic form, which is 'for ... := range a { ... }'. The expression to the right of the range is called the range expression. The specification says (emphasis mine):

The range expression x is evaluated once before beginning the loop, with one exception: if at most one iteration variable is present and len(x) is constant, the range expression is not evaluated.

Obviously if the range expression is a function call, the function call must be made (once) and then the return value used in the range expression. However, in Go even evaluating an expression that's a single variable produces a copy of the value of that variable (in the abstract; in the concrete the compiler may optimize this out). So when you write 'for a, b := range c', Go (nominally) evaluates c and uses the resulting copy of c's current value.

(Among other consequences, this means that assigning a different value to c itself inside the loop doesn't change what the loop does; c's value is frozen at the start, when it's evaluated.)

As the additional bit of the specification explains, this doesn't happen if you use at most one iteration value and you're ranging over one of the small number of things where len(x) is a constant (the rules for this are somewhat legalistic). If you use two iteration variables, you always evaluate the range expression and make a copy, which is another reason for Go to prefer the single variable version (to go with nudging you to not copy actual values unless necessary).

However, things get tricky if you use pointers. Here:

a := [5]int{1, 2, 3, 4, 5}
for _, v := range a {
    a[3] = 10
    fmt.Println("Pass 1:", v)
// reset our mutation
a[3] = 4
// loop via pointer:
b := &a
for _, v := range b {
    b[3] = 10
    fmt.Println("Pass 2:", v)

In the second loop, what gets copied when the range expression is evaluated is the pointer, not the array it points to (note that b is not a slice, it's a pointer to an array). Go's implicit dereferencing of pointers means that the code for the two loops looks exactly the same, although they behave differently (the first prints the original array values before the mutation in the loop, the second mutates 'a[3]' before printing it).

On the one hand, this may be confusing. On the other hand, this provides a way to effectively sidestep all sorts of range expression copying (if you don't want it); all you have to do is pointerize your range expression, and almost nothing will care about the difference. Fortunately often you don't care about the copying to begin with, because making copies of strings, slices, and maps doesn't require copying the underlying data. The only thing that you can range over that's expensive to copy is an actual array, and directly using actual arrays in Go is relatively rare (especially when using real arrays can cause interesting errors).

If you do a 'copying' range over anything other than a real array (which is copied) or a string (which is immutable), you can still mutate the values from what you're ranging over in your range loop in a way that future iterations of your range loop will or at least may see. Probably you don't want to do this.

(This is the consequence of ranging over slices and maps not making a copy of the underlying data. Because your range copies the slice itself, shrinking or enlarging the original slice won't change the number of iterations. You can potentially change the number of iterations of a map inside of the loop, though.)

Probably I don't need to care about this range copying, at least from an efficiency perspective (I had better remember its other consequences). My Go code (and Go in general) only very rarely uses fixed size arrays, which are the only expensive thing to copy. Copying slices and maps is pretty close to free, and those are usually what I range over (apart from channels, which I consider a special case).

programming/GoRangeCopying written at 00:55:56; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.