Exploiting the Bourne shell to parse configuration files

November 10, 2008

Suppose that you have what I will call a 'directive style' configuration file, where it looks something like (say):

fileserver fs1
exportto hostsa:hostsb
filesys /h/10 fs1/d100
filesys /h/11 fs1/d110

Further, suppose that you want to do things with this configuration file in a Bourne shell script, which requires parsing it to extract all of the information that you're interested in.

Normally this would be a pain, but it's recently struck me that you can exploit Bourne shell functions to do the parsing for you. Simply define a function for each directive that does something appropriate with its parameters, and then source the actual configuration file from your scripts (with the '.' operator). Bang, you're done, and you even get comments and line continuations for free.

(If you want to make this the official way that the configuration file is parsed, you can give people a fair amount of power with just things like shell variables to capture common pieces. You probably don't want to go all the way to encouraging them to use conditions and loops and so on, so that you can at least pretend that the configuration file can be parsed by something other than your Bourne shell scripts.)

Of course, half of making this useful here is figuring out how to represent and store the parsed data, since the Bourne shell is not exactly known for its data structures. But given that your main shell script code needs to use the information, you can always work backwards from how that code wants it to figure out a representation. And if you don't need all of the information (and you probably don't), you can just discard most of it, which simplifies storing the rest of it.

Disclaimer: this probably qualifies as an evil hack.


Comments on this page:

From 78.35.25.22 at 2008-11-10 06:22:58:

Evil hack? Not hardly.

It reminds me of the progression of a Lisp programmer:

  • The newbie realizes that the difference between code and data is trivial.
  • The expert realizes that all code is data.
  • The true master realizes that all data is code.

Aristotle Pagaltzis

From 208.44.121.252 at 2008-11-10 09:55:42:

Our config files are actually shell scripts that are included into the code with the prepended . :

. /operations/bin/operations.config 

This also has the benefit that when a user is running a cron sequence by hand, they can have that file interpreted on the line via the same mechanism.

The contents of that particular script are just variables that are used across the board in every script.

I've also gotten much more complex. My backup scripts, which are going to be replaced shortly by a software package, interpret config files to determine rsync options, so I can parse through an entire series of things to back up. Here's the config file syntax:

hostname:/path/to/data/on/source:/where/to/put/it:--rsync --flags

Example:
pri-fs1.int.domain:/mnt/operations/website/:/backup/nightly/website/:--delete --force -v

The code to parse this is a little ugly, but it does the job (I'm sure this will wrap):

cat $CONFIGFILE | grep -v \^\# | grep -v \^\$ | awk -F: '{print "echo \"Syncing " $2 " on " $1 "\"\ntime rsync -e ssh -az " $4 " " $1 ":" $2 " " $3 ";"}' > $BINDIR/fsync-$DATECHUNK.sh

Then the $BINDIR/fsync-$DATECHUNK.sh is executed.

Here's the whole script, with comments, so you can see how it works:


$ cat daily_filesync.sh  
#!/bin/bash                     
#                               
# daily_filesync.sh             
#                               
# This script reads in the config file, and produces
# a temporary script, which is calls then deletes, in 
# order to sync data from remote servers. 
# 
# If you want to add / remove syncs, you should 
# probably edit $CONFIGFILE instead. 
# 
# Matt Simmons
# bandman@gmail.com 
# Include all necessary variables
. include.sh 

# Perform the syncing

CONFIGFILE=$BINDIR/fsync/daily_filesync.conf

rm -f $NIGHTLY/AAA_CURRENT*
touch $NIGHTLY/AAA_CURRENT_IS_$DATECHUNK

# This next line may need some explanation. 
# We read in the config file, discard any commented
# lines, discard any empty lines, and assume everything
# else must be valid data. We massage that data into 
# commands using awk, write everything to a temp file, 
# run the file, then delete the file. The contents of
# the temp file end up looking like this: 
# time rsync -e ssh -a  <options> host:/directory localdir; 
#
# Sorry it's not more readable
# - MS
cat $CONFIGFILE | grep -v \^\# | grep -v \^\$ |
  awk -F: '{print "echo  \"Syncing " $2 " on " $1 "\"\n \
time rsync -e ssh -az " $4 " " $1 ":" $2 " " $3 ";"
           }' > $BINDIR/fsync-$DATECHUNK.sh

sh $BINDIR/fsync-$DATECHUNK.sh

rm -f $BINDIR/fsync-$DATECHUNK.sh
sh $BINDIR/backup-writer.sh

So there you can see. If a variable is referenced, but it looks like it's not initialized, it's in include.sh (which lives in the same directory). These include variables that I use throughout the backup process. Things like $BINDIR, $NIGHTLY, and $DATECHUNK. Incidentally, a similar set of config/script exists for the $WEEKLY backup set, but that doesn't need to be the case. Another column in the config file could easily provide this functionality, along with a preceding '| grep ":weekly:' in that complex parsing line.

Sorry for the overly long post. I hope this helps someone, or maybe shows them how not to do things ;-)

By cks at 2008-11-10 12:05:31:

(Honesty in administration note: I've used magic site admin powers to re-wrap the previous comment a bit so that it doesn't force a huge page width. You will need to fix the 'awk' line in the script before you use it as-is; hopefully how to fix it is obvious from the previous stuff in the comment.)

Written on 10 November 2008.
« The history of Unix *dump programs
Another attempt to split SSL into encryption and trust »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Nov 10 00:20:36 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.