Some of my views on using YAML for human-written configuration files

August 13, 2021

Over on Twitter, I said something:

Hot take: YAML isn't a configuration language or a configuration language format, it's a serialization format. Is de-serializing some data structures the best way to configure a program? Maybe not. (Probably not. Mostly not.)

Like programming languages, all configuration systems communicate with both the computer and other people. But most are designed only for the computer to consume, not to be clear when people read it. De-serializing your live data structures is an extreme example of this.

(I've said the second bit before and I'm sure I'll say it again. See also.)

There are some configurations that are simple enough that I think YAML works okay; I'd say that these are pretty much ones that have sections with 'key = value' settings (but there are simpler, more readable formats for this, like TOML). Once you go beyond that to having your configuration in more complicated data structures, you start to have issues. Of course you can de-serialize to initial data formats that are then further interpreted by your program to create your actual configuration, but then you have an additional problem:

What YAML does is provide a straightforward format for simple data. It's mostly used to deserialize into some data structures of yours. YAML is opaque and relatively hostile to any structure beyond that; you get to embed it in YAML strings and structural relationships.

There are plenty of programs with complex configuration needs. If you use YAML for a program like this, you get at least one of two bad results; either you're using YAML to transport strings that will really be interpreted much more deeply later by the program, or you have to attempt to program your program through YAML structural relationships between magic keys, like Prometheus label rewrite rules.

As a string transport mechanism, YAML does mean you don't have to write a file level parser (but you're still going to be parsing your strings). But you pay a high price for that, especially in typical environments with bad YAML error reporting and bad YAML passthrough of things like line numbers, and file level parsers are not particularly difficult to write. And in the name of avoiding writing a decent file level parser, you're sticking people who have to deal with your configuration file with problems like YAML's whitespace issues, YAML's general complexity, and the general issue that editing strings embedded in YAML is generally not a particularly great experience.

If you attempt to configure some things through structural relationships between (YAML) elements, congratulations, you've just created a custom configuration language that is really terrible and verbose, and probably has bad error reporting if people make mistakes (or no error reporting at all). People did this before in XML and it wasn't any better then.

Using a good custom designed configuration file format instead of trying to shove things through the narrow pipe of YAML means that you have one integrated syntax that can be designed to be more readable, more expressive, and much easier to write. It will probably be easier to provide good error messages about problems (both syntax and semantics), ones that point directly to the line and say specifically what the problem is.

PS: If you have a complex configuration, there's no way to get out of writing some sort of parser unless you go to the extreme of making people hand-write your AST in YAML elements. Either you have to parse those embedded strings (where much of the complexity is) or you have to interpret and validate the combination of YAML fields and structures, or both.

(Forcing people to hand-write ASTs for you is such a terrible idea that I hope no program actually does this.)


Comments on this page:

From 67.173.23.98 at 2021-08-13 19:14:06:

What do you think about jsonnet or cuelang?

From 69.165.150.105 at 2021-08-13 19:56:26:

I've always been partial to ISC/BIND-style configuration files. A lot of semi-colons, but otherwise the stanzas are handy and you don't have to worry about whitespace.

By Carl at 2021-08-13 20:36:27:

Quoting my own tweet on the subject:

It's fashionable to hate XML because it was used in a lot of places it was a bad fit in the 00s, but at least it's a pretty good document language.

YAML though is always a bad fit. If you want machine readable config, use JSON; human readable, use TOML. When does YAML ever fit?

https://twitter.com/carlmjohnson/status/1372224080749993988

Your critique is deeper than mine was though.

Forcing people to hand-write ASTs for you is such a terrible idea that I hope no program actually does this.

Emacs? 😉

Hey Chris, great article. My friend linked it to me.

I was wondering if you've heard of Dhall?

What do you think about using it as a standard for configuration?

https://dhall-lang.org/

Written on 13 August 2021.
« Prometheus alerts and the idea of "deadbands" (or maybe hysteresis) (with an implementation)
Our experience with IPP-based, PPD-less CUPS printing »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Fri Aug 13 17:22:11 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.