Capturing command output in a Bourne shell variable as a brute force option

November 27, 2019

Often, the natural form of generating and then processing something in the Bourne shell is as a pipeline:

smartctl -A /dev/$dsk | tr A-Z- a-z_ |
     fgrep -v ' unknown_' | awk '<process more>'

timeout 30s ssh somehost npppctl session brief |
    awk '<generate metrics>'

(Using awk is not necessarily recommended here, but it's the neutral default.)

However, there can be two problems with this. First, sometimes you want to process the command's output in several different ways, but you only want to run the command once (perhaps it's expensive). Second, sometimes you want to reliably detect that the initial command failed, or even not run any further steps if it failed because you don't trust that the output it generates on failure won't confuse the rest of the pipeline and produce bad results.

The obvious solution to this is to write the output of the first command into a temporary file, which you can then process and re-process as many times as you want. You can also directly check the first command's exit status (and results), and only proceed if things look good. But the problem with temporary files is that they're kind of a pain to deal with. You have to find a place to put them, you have to name them, you have to deal with them securely if you're putting them in /tmp (or $TMPDIR more generally), you have to remove them afterward (including removing them on error), and so on. There is a lot of bureaucracy and overhead in dealing with temporary files and it's easy to either miss some case or be tempted into cutting corners.

Lately I've been leaning on the alternate and somewhat brute force option of just capturing the command's output in the shell script, putting it into a shell variable:

smartout="$(smartctl -A /dev/$dsk)"
if [ $? -ne 0 ] ; then
echo "$smartout" | tr A-Z- a-z_ | ....
echo "$smartout" | awk '<process again>'

(Checking for empty output is optional but probably recommended.)

In the old Unix days when memory was scarce, this would have been horrifying (and potentially dangerous). Today, that's no longer really the case. Unless your commands generate a very large amount of output or something goes terribly wrong, you won't notice the impact of holding the entire output of your command in the shell's memory. In many cases the command will produce very modest amounts of output, on the order of a few Kb or a few tens of Kb, which is a tiny drop in the bucket of modern Bourne shell memory use.

(And if the command goes berserk and produces a giant amount of output, writing that to a file would probably have been equally much of a problem. If you hold it in the shell's memory, at least it automatically goes away if and when the shell dies.)

Capturing command output in shell variable solves all of my problems here. Shell variables don't have any of the issues of temporary files, they let you directly see the exit status of the first command in what would otherwise be the pipeline, and you can repeatedly re-process them through different additional things. I won't say it's entirely elegant, but it works and sometimes that (and simplicity) is my priority.

Written on 27 November 2019.
« In Prometheus, don't be afraid of high cardinality metrics if they're valuable enough
Selecting metrics to gather mostly based on what we can use »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Nov 27 00:24:12 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.