2010-03-27
Testing in the face of popen()
Suppose that you have a program that runs another program and uses its output; for example, you have a high level analysis program that runs a low level state gathering command (or several commands) to gather various information. Here is something that I have recently learned the hard way:
You really want to give your high level program an option to get this status output from a file instead of by running the commands.
Doing this saves you from having to recreate a specific scenario each time you want to test how your high level logic handles the situation. Instead, you reproduce each scenario once, save the output of the low level state gathering tools, and test your program offline. Speaking from personal experience, this avoids a lot of tedium and makes it entirely sure that you're re-testing exactly what you think you're re-testing, not a slightly different scenario.
There are two things to watch out for with this approach. Obviously, this only really works when you have the output of the low level tools nailed down; it's fairly pointless if you're still determining what information they have to output and what format it needs to be in. Second, you need to be sure that your low level tools really always produce the same information (and in the same format) on all of the systems you're going to run your high level program on.
(I have run into some cases where this wasn't so, even when I wrote the low level tools myself, because some systems just didn't provide some bits of information or reported the same scenario in different ways. Of course, this is unpleasant to find out at any point in development.)
2010-03-19
The problem with general purpose languages as configuration languages
Lately, one of the popular ways to configure programs is not to come up with a custom configuration language or format (and the associated parsers and so on) but to use a general purpose language for this purpose, generally augmented with as much of a DSL as you can wedge in. One example of this is Rake, but there are others; I have the impression that it's most popular when there may be actual logic that you need to express in the program configuration, as is the case for makefiles.
It's my view that having your program configured through writing things in a general purpose language is generally a bad mistake (at least once you consider the purpose of configuration systems). To put it simply, the problem with configuration systems is almost always that they lack clarity, not that they lack power. Using a general purpose language is adding power while subtracting clarity; now, in order to understand what the configuration is going to do you need to mentally execute the configuration program in your head, on top of understanding the effects of the configuration options themselves. Overall, this means that you're solving the wrong problem. Even if your configuration system lacks both clarity and power, using a general purpose language solves only half of the problem while making the other half worse.
(It is a rare configuration system indeed that has clarity to spare but lacks power.)
It's easy to see why this approach to the problem is superficially attractive to programmers. As the old math joke goes, it reduces configuration systems to an already solved problem, making them into 'just' simple programs, and in a powerful language that someone else has already designed and implemented.
(I've observed that programmers tend to like hitting things with the programming hammer, as it were, and feel that programming can be the solution to all problems. Sometimes this works well; sometimes it produces things that only programmers like.)
I also suspect that attempts to twist the general purpose language into something DSL-like that is nominally human readable do more harm than good in the long run. The problem is that you're forcing people to work in two languages, not one, since now they have to know both the general purpose language and your DSL (instead of just the language and what the various subroutines you make available do). That the DSL is theoretically human readable doesn't help, because when you're dealing with a DSL you can't assume that you know what something does just because it's in pseudo-English (or pseudo-whatever).
2010-03-09
How not to design an API (in C): the enum ordering mistake
Suppose that you are creating an API in C and that you have a return
value that is just right for an enum; for example, it communicates
either 'all is okay' or some range of errors and exceptional conditions.
Here's how not to write this API:
typedef enum { ERROR_1, ERROR_2, ERROR_3, ALL_OK } error_t;
You don't want to do this, because sooner or later you're going to want
to add another error condition, ERROR_4, and the end result of
putting it after ALL_OK is going to look somewhere between ugly and
stupid.
The rule of thumb with enums and similar objects is that the fixed point goes at the start of the range. You are unlikely to have more than one 'all is fine' return code, so it is the fixed point and goes at the start.
The extra special way not to design this API is to do this and then
just put ERROR_4 where it belongs, ie before ALL_OK. If you
do this, any number of people will throttle you because you have just
destroyed binary compatibility by renumbering ALL_OK's actual value.
Worse, the broken binary compatibility may be subtle, depending on where
and how people use the enum, since only one value has shifted.
(Admittedly this is only an issue in C and similar compiled languages
that turn enums into actual integers behind the scenes. In other
languages, this confusion can't happen; either ALL_OK is silently
renumbered in all code that's using it or ALL_OK is purely a symbol
with no numeric value attached to it as such.)
You would think that people wouldn't do this. Sadly, I have just seen this mistake made in software from a major vendor, assuming that it was a mistake instead of a deliberate decision to subtly punish people who counted on binary compatibility when it wasn't documented.
(PS: if you want to punish these people, it is much more productive and direct to spectacularly break your ABI so that people can't help but notice. People are kind of slow to notice subtle problems and they may not even realize what's going on for some time.)