Wandering Thoughts archives

2021-08-13

Some of my views on using YAML for human-written configuration files

Over on Twitter, I said something:

Hot take: YAML isn't a configuration language or a configuration language format, it's a serialization format. Is de-serializing some data structures the best way to configure a program? Maybe not. (Probably not. Mostly not.)

Like programming languages, all configuration systems communicate with both the computer and other people. But most are designed only for the computer to consume, not to be clear when people read it. De-serializing your live data structures is an extreme example of this.

(I've said the second bit before and I'm sure I'll say it again. See also.)

There are some configurations that are simple enough that I think YAML works okay; I'd say that these are pretty much ones that have sections with 'key = value' settings (but there are simpler, more readable formats for this, like TOML). Once you go beyond that to having your configuration in more complicated data structures, you start to have issues. Of course you can de-serialize to initial data formats that are then further interpreted by your program to create your actual configuration, but then you have an additional problem:

What YAML does is provide a straightforward format for simple data. It's mostly used to deserialize into some data structures of yours. YAML is opaque and relatively hostile to any structure beyond that; you get to embed it in YAML strings and structural relationships.

There are plenty of programs with complex configuration needs. If you use YAML for a program like this, you get at least one of two bad results; either you're using YAML to transport strings that will really be interpreted much more deeply later by the program, or you have to attempt to program your program through YAML structural relationships between magic keys, like Prometheus label rewrite rules.

As a string transport mechanism, YAML does mean you don't have to write a file level parser (but you're still going to be parsing your strings). But you pay a high price for that, especially in typical environments with bad YAML error reporting and bad YAML passthrough of things like line numbers, and file level parsers are not particularly difficult to write. And in the name of avoiding writing a decent file level parser, you're sticking people who have to deal with your configuration file with problems like YAML's whitespace issues, YAML's general complexity, and the general issue that editing strings embedded in YAML is generally not a particularly great experience.

If you attempt to configure some things through structural relationships between (YAML) elements, congratulations, you've just created a custom configuration language that is really terrible and verbose, and probably has bad error reporting if people make mistakes (or no error reporting at all). People did this before in XML and it wasn't any better then.

Using a good custom designed configuration file format instead of trying to shove things through the narrow pipe of YAML means that you have one integrated syntax that can be designed to be more readable, more expressive, and much easier to write. It will probably be easier to provide good error messages about problems (both syntax and semantics), ones that point directly to the line and say specifically what the problem is.

PS: If you have a complex configuration, there's no way to get out of writing some sort of parser unless you go to the extreme of making people hand-write your AST in YAML elements. Either you have to parse those embedded strings (where much of the complexity is) or you have to interpret and validate the combination of YAML fields and structures, or both.

(Forcing people to hand-write ASTs for you is such a terrible idea that I hope no program actually does this.)

programming/YAMLAndConfigurationFiles written at 17:22:11; Add Comment

Prometheus alerts and the idea of "deadbands" (or maybe hysteresis) (with an implementation)

In a comment on my entry on maybe avoiding flapping alerts, antiphase brought up the concept of a deadband, although what I'm going to talk about might be considered hysteresis instead. Put informally, the idea is that you have a different threshold for turning an alert on and for turning it off. For example, you could trigger an alert when some_metric went over a value of 1000, but not turn the alert off until the value fell below 900. This band where the alert won't turn on but will stay on if it's already on de-flaps the alert by effectively requiring much larger swings in the metric value to re-trigger it; it can't be triggered repeatedly by a small oscillation around 1000.

Prometheus has no native support for this. Alert rule expressions are either 'true' (ie, yielding a value) or they aren't. If they're true, the alert is firing; if they're not, the alert isn't. There's no separate alert rule expression for when to stop triggering an alert. But since Prometheus exposes a metric for whether an alert is firing, we can (in theory) write our own deadband expression.

The following is untested (except for syntax), but I think it would generally be like this (note that this is not proper YAML syntax for a multi-line PromQL expression, I'm not looking up the YAML string embedding rules tonight):

- alert: SomeAlert
  expr: some_metric > 1000 or \
        ( some_metric >= 900 \
           and ignoring(alertname, alertstate, ...) \
            ALERTS{alertname="SomeAlert", alertstate="firing"} )

If you add extra labels to your SomeAlert alert in the alert rule, you'll need to add them to the ignoring().

The first simple expression is our initial trigger, that some_metric is above 1000. The first bit of our parenthesized expression is our setting for not turning off the alert (ie, for continuing it), which is that some_metric is 900 or higher. Then the whole 'and ignoring(...) ALERTS{...}' portion of the expression is the simple condition of 'is the alert currently firing'. So the alert should be on if either the metric is above the initial trigger level, or the alert is currently on and the metric hasn't yet fallen below our cut-off value.

This alert rule can usefully be used with a 'for' limitation, which would make it not trigger until some_metric had been above 1000 for however long. If you use a 'for', you probably really want to make sure you restrict the ALERTS match to a firing alert. Otherwise what you have is an alert that will trigger if at one point some_metric goes above 1000 and then doesn't fall below 900 for the 'for' duration. (Of course, you might want such an alert.)

While this works (I think), I'm relatively sure that this is being too clever and complicated. Probably you want to try to use other approaches to de-flapping alerts, ones that are simpler and easier to understand. But if someday I absolutely have to do this, at least I've worked it out now.

sysadmin/PrometheusAlertsDeadband written at 01:02:26; Add Comment

Learning that Vim has insert mode keystrokes that do special things

I use Vim a fair bit, but most of the time I'm merely doing ordinary text entry, predominantly in insert mode. At the same time, I am not the world's best typist (my Delete key gets a good workout). One of my long-standing Vim experiences is that I will be typing along, happily entering text, and then I will do something and suddenly I will have a jumble of unwanted text and text changes.

(This is different from the classical Vi experience where you fumble what you're typing in command mode and all sorts of things happen.)

For a long time, I assumed that I had probably accidentally escaped into command mode and triggered the classical Vi mistake of typing random things in command mode. However, recently I was reading A Vim Guide for Adept Users (one of my hobbies is reading Vim guides), and hit the section on Useful Keystrokes in Insert Mode. A little light went on in my mind.

I've always known that Vim responds to some control keys and key sequences in insert mode, and in fact one of the ways I'm using Vim instead of Vi is that I want Delete in insert mode to back up past the start of the line. However, I hadn't previously known that Vim had such a significant collection of text modification keystrokes in insert mode. The two keystrokes that seem most likely to be responsible for various of my mistakes are Ctrl-a (which will insert various amounts of text) and Ctrl-@ (which inserts text and then escapes to command mode on the spot, where my continued typing will cause even more damage). Ctrl-a is relatively easy to hit, too.

The ins-special-keys section of the insert mode documentation has the full list. Some of them seem potentially useful, especially Ctrl-t and Ctrl-d.

PS: My unintended text alteration adventures are probably not helped by my habit of escaping to command mode periodically to do various fidgets, like writing the file or reflowing the paragraph. Command mode has all sorts of dangerous characters that can cause lots of havoc, including '.' and the number keys, and there are a number of ways to accidentally start entering a multi-character sequence that will trap and reinterpret the rest of what you think you're typing as commands.

unix/VimHasInsertModeKeystrokes written at 00:19:10; Add Comment

Go keeps surprising me with its careful design and specification

When I started writing my entry on why it matters that map values are unaddressable in Go, I expected to end it with a remark to the effect that I didn't understand why the Go authors had put this restriction in the specification but they probably had good reasons. But by the time I finished writing the entry, I had realized the language semantics problem of allowing 'm["nosuchkey"]' to be addressable. Then later when I looked up how Go maps store their values (and keys) I saw how allowing you to take the address of a map value probably wouldn't do what you wanted in natural Go.

I've had this experience more than once, where I've been surprised by how quietly careful Go's design and specification is. There are various technical areas of the Go specification that have had what seemed like arcane restrictions or rules, but when I've thought more deeply about them I've come up with reasonably good reasons for the rules to exist.

(Sometimes these are small ones, like how arbitrary precision constants affect cross compilation. Even things like always requiring delimited if blocks have reasons.)

On the one hand, this shouldn't be surprising in general. The designers of Go were quite experienced, knew what they were doing, and spent a fair amount of time working on it. Given that, it's very likely that everything in the Go specification was carefully considered and has a solid reason behind it, even if it's not immediately obvious.

On the other hand, this is not necessarily the usual experience with languages, especially languages that haven't gone through a formal (and somewhat adversarial) specification process. Solid language specifications are genuinely hard to create and you don't see them very often.

PS: This isn't to say that Go's design and specification is flawless, even apart from features it simply doesn't have. I haven't gone looking for flaws, but they probably exist and people have probably written about them.

programming/GoCarefulDesign written at 00:01:21; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.