Using awk to check your script's configuration file

May 13, 2025

Suppose, not hypothetically, that you have a shell script with a relatively simple configuration file format that people can still accidentally get wrong. You'd like to check the configuration file for problems before you use it in the rest of your script, for example by using it with 'join' (where things like the wrong number or type of fields will be a problem). Recently on the Fediverse I shared how I was doing this with awk, so here's a slightly more elaborate and filled out version:

errs=$(awk '
         $1 ~ "^#" { next }
         NF != 3 {
            printf " line %d: wrong number of fields\n", NR;
            next }
         [...]
         ' "$cfg_file"
       )

if [ -n "$errs" ]; then
   echo "$prog: Errors found in '$cfg_file'. Stopping." 1>&2
   echo "$errs" 1>&2
   exit 1
fi

(Here I've chosen to have awk's diagnostic messages indented by one space when the script prints them out, hence the space before 'line %d: ...'.)

The advantage of having awk simply print out the errors it detects and letting the script deal with them later is that you don't need to mess around with awk's exit status; your awk program can simply print what it finds and be done. Using awk for the syntax checks is handy because it lets you express a fair amount of logic and checks relatively simply (you can even check for duplicate entries and so on), and it also gives you line numbers for free.

One trick with using awk in this way is to progressively filter things in your checks (by skipping further processing of the current line with 'next'). We start out by skipping all comments, then we report and otherwise skip every line with the wrong number of fields, and then every check after this can assume that at least we have the right number of fields so it can confidently check what should be in each one. If the number of fields in a line is wrong there's no point in complaining about how one of them has the wrong sort of value, and the early check and 'next' to skip the rest of this line's processing is the simple way.

If you're also having awk process the configuration file later you might be tempted to have it check for errors at the same time, in an all-in-one awk program, but my view is that it's simpler to split the error checking from the processing. That way you don't have to worry about stopping the processing if you detect errors or intermingle processing logic with checking logic. You do have to make sure the two versions have the same handling of comments and so on, but in simple configuration file formats this is usually easy.

(Speaking from personal experience, you don't want to use '$1 == "#"' as your comment definition, because then you can't just stick a '#' in front of an existing configuration file line to comment it out. Instead you have to remember to make it '# ', and someday you'll forget.)

PS: If your awk program is big and complex enough, it might make more sense to use a here document to create a shell variable containing it, which will let you avoid certain sorts of annoying quoting problems.


Comments on this page:

By Allan Wind at 2025-05-14 01:36:52:

While I appreciate awk like the next connoisseur of fine technology, why bother if you are embedding it in a shell script?

nf() {
	echo $#
}

check() {
	nr=1
	while IFS= read -r line
	do
		case $line in
			\#*)
				;;
			*)
				[ $(nf $line) -eq 3 ] || echo " line $nr: wrong number of fields"
				;;
		esac
		nr=$(($nr+1))
	done <"$1"
}

errs=$(check input.txt)

If you split checking and parsing, an anti-pattern, you should cache the file content so it doesn't change between those two steps.

Hmm. $1 ~ "^#" { ... } feels like an I-need-to-think-about-why-it-was-written-like-that way of saying /^[ \t]*#/ { ... }.

(Or did you really mean /^#/ { ... } and didn’t realize that that wasn’t what you were saying? In your case I’m guessing that’s not the case – but if I saw this is in code of unknown provenance, I would have a question mark there.)

(As for shell, well, the fact that it is nowadays capable of text processing doesn’t mean it’s particularly good at it. To me one look at the example answers exactly why you would bother to write it in awk.)

By cks at 2025-05-14 09:59:59:

I should have made this clearer: in the real version, the exclusion of comments and the check for the right number of fields are only the trivial preliminary checks. The rest of the checks are more complicated, doing things like regular expression based field validation, checking for fields with incorrectly duplicated values, etc. They could be done in the shell with sufficient work but this is a Turing tarpit. Even the basic validation is more clearly and concisely done in awk, never mind the extended version.

As for the '$1 ~ "#"' thing, one answer is that this saves me from remembering the syntax for whitespace in an awk regular expression, but the real answer is that I started with the simple '$1 == "#"' version and discovered that that was a mistake.

Written on 13 May 2025.
« Our need for re-provisioning support in mesh networks (and elsewhere)
Two broad approaches to having Multi-Factor Authentication everywhere »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Tue May 13 22:39:35 2025
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.