Python programs as wrappers versus filters of other Unix programs

May 16, 2022

Sometimes I wind up in a situation, such as using smartctl's JSON output, where I want to use a Python program to process and transform the output from another Unix command. In a situation like this, there are two ways of structuring things. I can have the Python program run the other command as a subprocess, capture its output, and process it, or I can have a surrounding script run the other command and pipe its output to the Python program, with the Python program acting as a (Unix) filter. I've written programs in both approaches depending on the situation.

Which sort of begs the question, namely what sort of situation makes me choose one option or the other? One reason for choosing the wrapper approach is the ease of copying the result places; a Python wrapper is only one self-contained thing to copy around to our systems, while a shell script that runs a Python filter is at least two things (and then the shell script has to know where to find the Python program). And in general, a Python wrapper program makes the whole thing feel like there are fewer moving parts (that it runs another Unix command as the program's starting point is sort of an implementation detail that people don't have to think about).

(The self contained nature of wrappers pushes me toward wrappers for things that I expect to copy to systems only on an 'as needed' basis, instead of having them installed as part of system setup or the like.)

One reason I reach for the filter approach is if I have a certain amount of logic that's most easily expressed in a shell script, for example selecting what disks to report SMART data on and then iterating over them. Shell scripts make expanding file name glob patterns very easy; Python requires more work for this. I have to admit that how the idea evolved also plays a role; if I started out thinking I had a simple job of reformatting output that could be done entirely in a shell script, I'm most likely to write the Python as a filter that drops into it, rather than throw the shell script away and write a Python wrapper. Things that start out clearly complex from the start are more likely to be a Python wrapper instead of a filter used by a shell script.

(The corollary of this is if I'm running the other command once with more or less constant arguments, I'm much more likely to write a wrapper program instead of a filter.)

I believe that there are (third party) Python packages that are intended to make it easy to write shell script like things in Python (and I think I was even pointed at one once, although I can't find the reference now). In theory I could use these and native Python facilities to write more Python programs as wrappers; in practice, I'm probably going to take the path of least resistance and continue to do a variety of things as shell scripts with Python programs as filters.

I don't know if writing this entry is going to get me to be more systematic and conscious about making this choice between a wrapper and a filter, but I can hope so.

PS: Another aspect of the choice is that it feels easier (and better known) to adjust the settings of a shell script by changing commented environment variables at the top of the script than making the equivalent changes to global variables in the Python program. I suspect that this is mostly a cultural issue; if we were more into Python, it would probably feel completely natural to us to do this to Python programs (and we'd have lots of experience with it).

Comments on this page:

By John Wiersba at 2022-05-17 23:15:13:

Another option is a shell wrapper around a filter program embedded inside the shell script. A variety of scripting languages can more or less easily be used for this paradigm, such as perl or awk.

   smartctl --version | perl -lne 'print $1 if /build host: (.*)/'

This gives you the dual benefits of one-file deployment and a structured programming language for the parts where you need it.

In fact, depending on how you conceive of the problem, you might end up with several stages of filters which, over time, you can restructure in multiple ways as you see fit.

   GEN_DATA() { cat /etc/passwd; }
   ANOTHER_FILTER() { cat; }
   GEN_DATA | perl -pe '
     # ...
   ' | ANOTHER_FILTER | perl -ne '
     print if /root/
   ' | { [ "$1" != debug ] && cat || tee FILE; } | perl -pe '
     # ...
Written on 16 May 2022.
« The idea of hierarchical filesystems doesn't feel like an API to me
Why I'm not all that positive on working through serial consoles »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon May 16 22:10:52 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.