Using awk to check your script's configuration file

May 13, 2025

Suppose, not hypothetically, that you have a shell script with a relatively simple configuration file format that people can still accidentally get wrong. You'd like to check the configuration file for problems before you use it in the rest of your script, for example by using it with 'join' (where things like the wrong number or type of fields will be a problem). Recently on the Fediverse I shared how I was doing this with awk, so here's a slightly more elaborate and filled out version:

errs=$(awk '
         $1 ~ "^#" { next }
         NF != 3 {
            printf " line %d: wrong number of fields\n", NR;
            next }
         [...]
         ' "$cfg_file"
       )

if [ -n "$errs" ]; then
   echo "$prog: Errors found in '$cfg_file'. Stopping." 1>&2
   echo "$errs" 1>&2
   exit 1
fi

(Here I've chosen to have awk's diagnostic messages indented by one space when the script prints them out, hence the space before 'line %d: ...'.)

The advantage of having awk simply print out the errors it detects and letting the script deal with them later is that you don't need to mess around with awk's exit status; your awk program can simply print what it finds and be done. Using awk for the syntax checks is handy because it lets you express a fair amount of logic and checks relatively simply (you can even check for duplicate entries and so on), and it also gives you line numbers for free.

One trick with using awk in this way is to progressively filter things in your checks (by skipping further processing of the current line with 'next'). We start out by skipping all comments, then we report and otherwise skip every line with the wrong number of fields, and then every check after this can assume that at least we have the right number of fields so it can confidently check what should be in each one. If the number of fields in a line is wrong there's no point in complaining about how one of them has the wrong sort of value, and the early check and 'next' to skip the rest of this line's processing is the simple way.

If you're also having awk process the configuration file later you might be tempted to have it check for errors at the same time, in an all-in-one awk program, but my view is that it's simpler to split the error checking from the processing. That way you don't have to worry about stopping the processing if you detect errors or intermingle processing logic with checking logic. You do have to make sure the two versions have the same handling of comments and so on, but in simple configuration file formats this is usually easy.

(Speaking from personal experience, you don't want to use '$1 == "#"' as your comment definition, because then you can't just stick a '#' in front of an existing configuration file line to comment it out. Instead you have to remember to make it '# ', and someday you'll forget.)

PS: If your awk program is big and complex enough, it might make more sense to use a here document to create a shell variable containing it, which will let you avoid certain sorts of annoying quoting problems.

Written on 13 May 2025.
« Our need for re-provisioning support in mesh networks (and elsewhere)
Two broad approaches to having Multi-Factor Authentication everywhere »

Page tools: View Source.
Search:
Login: Password:

Last modified: Tue May 13 22:39:35 2025
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.