Wandering Thoughts archives

2010-02-28

Why XML is terrible for configuration files

There's a lot of things that get called 'configuration files', so I want to be specific that I mean the sort of configuration files that have three primary uses: they're written by people, used by programs, and later read by people who are trying to figure out what the programs are set up to do. These are the kind of configuration files that XML is terrible for.

(There's a whole ecology of 'configuration files' that are generated by one program and consumed by others, and are pretty much never touched by people. I don't care what format they're in, and there's perfectly sensible reasons to use XML for them.)

The reasons that XML is terrible for configuration files are right there in the description of what these sorts of configuration files are for. The only one of those three things that XML makes easy is being used by programs; XML is famously difficult and annoying to write by hand and equally hard to read (partly because it is so verbose; excess verbosity causes people to lose track of where they are). Common ways to structure data in XML files make this even worse because they tend to be designed for the convenience of programs, not to be comprehensible to people.

General XML editors improve the situation somewhat but I feel that they don't really do all that much for making it genuinely easy for people to write and read XML. This goes doubly so if your XML format has data structuring issues.

XML is also prone to a particular disease, which is best illustrated by asking a question: is your XML format actually documented, in full detail, in a way that is at least as good as the Atom feed format specification or the description of your favorite program's regular configuration file? All too often the answer is that it is not, because people have the peculiar impression that using XML with verbose element and attribute names plus some sketchy documentation is sufficient.

(Please note that a DTD is not documentation. Try again.)

This issue cannot be solved by creating a nice user-friendly program to create and maintain the XML configuration file. If you do this, what you have really done is created a program without a real configuration file that is instead configured only through an GUI interface. And you still have the documentation problem; it's just that you now have to document the effects of the program instead of the configuration file.

(For bonus points, this configuration process is generally asynchronous so you can't immediately see the effects of your configuration changes.)

tech/XMLNotConfigurationFile written at 22:28:30; Add Comment

The dividing line between supporting code and forking it

Suppose that you want to provide support for a bunch of open source code but are not the primary author or maintainer. Inevitably you will have to patch bugs and add features on your own in some way (and perhaps fix the code to port it to your environment or whatever). This leads to a question: where is the dividing line between merely supporting the code and actually forking your own version of the code?

To me, the dividing line is whether you can get your changes accepted upstream and applied to the main codebase (assuming that you even try). If you can't get changes accepted upstream, you have to maintain your own version of the code, forward-porting all of your fixes and changes to new versions of the main code that you chose to adopt (or perhaps porting changes from the main code into your version). Ergo, forking.

If you can get changes accepted upstream you have less divergence, especially over the long term. Instead of carrying changes that have to live forever and be perpetually forward-ported, you are instead simply developing and maintaining changes until they can be merged, which is part of normal development for many projects. (Of course, ideally you merge changes upstream as fast as possible; the faster you merge, the less work each of your changes is for you.)

The direct corollary is that your ability to merely support code instead of forking it depends on how willing the upstream is to accept your changes. A closed or mostly closed upstream forces you to effectively fork, whether you like it or not. And of course this presumes that you can find an upstream and that the upstream is alive.

(In real life many people doing 'support' for open source projects carry a mix of changes that are in the process of going upstream and changes that the upstream will never accept for various reasons. Note that a lot of people do this sort of support in the open source world, as this issue applies to lots of people who packages programs for particular operating systems.)

tech/SupportingVsForking written at 00:16:16; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.