Some of my views on using YAML for human-written configuration files
Over on Twitter, I said something:
Hot take: YAML isn't a configuration language or a configuration language format, it's a serialization format. Is de-serializing some data structures the best way to configure a program? Maybe not. (Probably not. Mostly not.)
Like programming languages, all configuration systems communicate with both the computer and other people. But most are designed only for the computer to consume, not to be clear when people read it. De-serializing your live data structures is an extreme example of this.
There are some configurations that are simple enough that I think YAML works okay; I'd say that these are pretty much ones that have sections with 'key = value' settings (but there are simpler, more readable formats for this, like TOML). Once you go beyond that to having your configuration in more complicated data structures, you start to have issues. Of course you can de-serialize to initial data formats that are then further interpreted by your program to create your actual configuration, but then you have an additional problem:
What YAML does is provide a straightforward format for simple data. It's mostly used to deserialize into some data structures of yours. YAML is opaque and relatively hostile to any structure beyond that; you get to embed it in YAML strings and structural relationships.
There are plenty of programs with complex configuration needs. If you use YAML for a program like this, you get at least one of two bad results; either you're using YAML to transport strings that will really be interpreted much more deeply later by the program, or you have to attempt to program your program through YAML structural relationships between magic keys, like Prometheus label rewrite rules.
As a string transport mechanism, YAML does mean you don't have to write a file level parser (but you're still going to be parsing your strings). But you pay a high price for that, especially in typical environments with bad YAML error reporting and bad YAML passthrough of things like line numbers, and file level parsers are not particularly difficult to write. And in the name of avoiding writing a decent file level parser, you're sticking people who have to deal with your configuration file with problems like YAML's whitespace issues, YAML's general complexity, and the general issue that editing strings embedded in YAML is generally not a particularly great experience.
If you attempt to configure some things through structural relationships between (YAML) elements, congratulations, you've just created a custom configuration language that is really terrible and verbose, and probably has bad error reporting if people make mistakes (or no error reporting at all). People did this before in XML and it wasn't any better then.
Using a good custom designed configuration file format instead of trying to shove things through the narrow pipe of YAML means that you have one integrated syntax that can be designed to be more readable, more expressive, and much easier to write. It will probably be easier to provide good error messages about problems (both syntax and semantics), ones that point directly to the line and say specifically what the problem is.
PS: If you have a complex configuration, there's no way to get out of writing some sort of parser unless you go to the extreme of making people hand-write your AST in YAML elements. Either you have to parse those embedded strings (where much of the complexity is) or you have to interpret and validate the combination of YAML fields and structures, or both.
(Forcing people to hand-write ASTs for you is such a terrible idea that I hope no program actually does this.)
Comments on this page:Written on 13 August 2021.