Wandering Thoughts archives

2019-04-18

One reason ed(1) was a good editor back in the days of V7 Unix

It is common to describe ed(1) as being line oriented, as opposed to screen oriented editors like vi. This is completely accurate but it is perhaps not a complete enough description for today, because ed is line oriented in a way that is now uncommon. After all, you could say that your shell is line oriented too, and very few people use shells that work and feel the same way ed does.

The surface difference between most people's shells and ed is that most people's shells have some version of cursor based interactive editing. The deeper difference is that this requires the shell to run in character by character TTY input mode, also called raw mode. By contrast, ed runs in what Unix usually calls cooked mode, where it reads whole lines from the kernel and the kernel handles things like backspace. All of ed's commands are designed so that they work in this line focused way (including being terminated by the end of the line), and as a whole ed's interface makes this whole line input approach natural. In fact I think ed makes it so natural that it's hard to think of things as being any other way. Ed was designed for line at a time input, not just to not be screen oriented.

(This was carefully preserved in UofT ed's very clever zap command, which let you modify a line by writing out the modifications on a new line beneath the original.)

This input mode difference is not very important today, but in the days of V7 and serial terminals it made a real difference. In cooked mode, V7 ran very little code when you entered each character; almost everything was deferred until it could be processed in bulk by the kernel, and then handed to ed all in a single line which ed could also process all at once. A version of ed that tried to work in raw mode would have been much more resource intensive, even if it still operated on single lines at a time.

(If you want to imagine such a version of ed, think about how a typical readline-enabled Unix shell can move back and forth through your command history while only displaying a single line. Now augment that sort of interface with a way of issuing vi-like bulk editing commands.)

This is part of why I feel that ed(1) was once a good editor (cf). Ed is carefully adapted for the environment of early Unixes, which ran on small and slow machines with limited memory (which led to ed not holding the file it's editing in memory). Part of that adaptation is being an editor that worked with the system, not against it, and on V7 Unix that meant working in cooked mode instead of raw mode.

(Vi appeared on more powerful, more capable machines; I believe it was first written when BSD Unix was running on Vaxes.)

Update: I'm wrong in part about how V7 ed works; see the comment from frankg. V7 ed runs in cooked mode but it reads input from the kernel a character at a time, instead of in large blocks.

unix/EdDesignedForCookedInput written at 23:25:56; Add Comment

A pattern for dealing with missing metrics in Prometheus in simple cases

Previously, I mentioned that Prometheus expressions are filters, which is part of Prometheus having a generally set-oriented view of the world. One of the consequences of this view is that you can quite often have expressions that give you a null result when you really want the result to be 0.

For example, let's suppose that you want a Grafana dashboard that includes a box that tells you how many Prometheus alerts are currently firing. When this happens, Prometheus exposes an ALERTS metric for each active alert, so on the surface you would count these up with:

count( ALERTS{alertstate="firing"} )

Then one day you don't have any firing alerts and your dashboard's box says 'N/A' or 'null' instead of the '0' that you want. This happens because 'ALERTS{alertstate="firing"}' matches nothing, so the result is a null set, and count() of a null set is a null result (or, technically, a null set).

The official recommended practice is to not have any metrics and metric label values that come and go; all of your metrics and label sets should be as constant as possible. As you can tell with the official Prometheus ALERTS metric, not even Prometheus itself actually fully follows this, so we need a way to deal with it.

My preferred way of dealing with this is to use 'or vector(0)' to make sure that I'm never dealing with a null set. The easiest thing to use this with is sum():

sum( ALERTS{alertstate="firing"} or vector(0) )

Using sum() has the useful property that the extra vector(0) element has no effect on the result. You can often use sum() instead of count() because many sporadic metrics have the value of '1' when they're present; it's the accepted way of creating what is essentially a boolean 'I am here' metric such as ALERTS.

If you're filtering for a specific value or value range, you can still use sum() instead of count() by using bool on the comparison:

sum( node_load1 > bool 10 or vector(0) )

If you're counting a value within a range, be careful where you put the bool; it needs to go on the last comparison. Eg:

sum( node_load1 > 5 < bool 10 or vector(0) )

If you have to use count() for more complicated reasons, the obvious approach is to subtract 1 from the result.

Unfortunately this approach starts breaking down rapidly when you want to do something more complicated. It's possible to compute a bare average over time using a subquery:

avg_over_time( (sum( ALERTS{alertstate="firing"} or vector(0) ))[6h:] )

(Averages over time of metrics that are 0 or 1, like up, are the classical way of figuring out things like 'what percentage of the time is my service down'.)

However I don't know how to do this if you want something like an average over time by alert name or by hostname. In both cases, even alerts that were present some of the time were not present all of the time, and they can't be filled in with 'vector(0)' because the labels don't match (and can't be made to match). Nor do I know of a good way to get the divisor for a manual averaging. Perhaps you would want to do an unnecessary subquery so you can exactly control the step and thus the divisor. This would be something like:

sum_over_time( (sum( ALERTS{alertstate="firing"} ) by (alertname))[6h:1m] ) / (6*60)

Experimentation suggests that this provides plausible results, at least. Hopefully it's not too inefficient. In Grafana, you need to write the subquerry as '[$__range:1m]' but the division as '($__range_s / 60)', because the Grafana template variable $__range includes the time units.

(See also Existential issues with metrics.)

sysadmin/PrometheusMissingMetricsPattern written at 00:39:58; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.