Capturing command output in a Bourne shell variable as a brute force option

November 27, 2019

Often, the natural form of generating and then processing something in the Bourne shell is as a pipeline:

smartctl -A /dev/$dsk | tr A-Z- a-z_ |
     fgrep -v ' unknown_' | awk '<process more>'

timeout 30s ssh somehost npppctl session brief |
    awk '<generate metrics>'

(Using awk is not necessarily recommended here, but it's the neutral default.)

However, there can be two problems with this. First, sometimes you want to process the command's output in several different ways, but you only want to run the command once (perhaps it's expensive). Second, sometimes you want to reliably detect that the initial command failed, or even not run any further steps if it failed because you don't trust that the output it generates on failure won't confuse the rest of the pipeline and produce bad results.

The obvious solution to this is to write the output of the first command into a temporary file, which you can then process and re-process as many times as you want. You can also directly check the first command's exit status (and results), and only proceed if things look good. But the problem with temporary files is that they're kind of a pain to deal with. You have to find a place to put them, you have to name them, you have to deal with them securely if you're putting them in /tmp (or $TMPDIR more generally), you have to remove them afterward (including removing them on error), and so on. There is a lot of bureaucracy and overhead in dealing with temporary files and it's easy to either miss some case or be tempted into cutting corners.

Lately I've been leaning on the alternate and somewhat brute force option of just capturing the command's output in the shell script, putting it into a shell variable:

smartout="$(smartctl -A /dev/$dsk)"
if [ $? -ne 0 ] ; then
   ....
fi
echo "$smartout" | tr A-Z- a-z_ | ....
echo "$smartout" | awk '<process again>'

(Checking for empty output is optional but probably recommended.)

In the old Unix days when memory was scarce, this would have been horrifying (and potentially dangerous). Today, that's no longer really the case. Unless your commands generate a very large amount of output or something goes terribly wrong, you won't notice the impact of holding the entire output of your command in the shell's memory. In many cases the command will produce very modest amounts of output, on the order of a few Kb or a few tens of Kb, which is a tiny drop in the bucket of modern Bourne shell memory use.

(And if the command goes berserk and produces a giant amount of output, writing that to a file would probably have been equally much of a problem. If you hold it in the shell's memory, at least it automatically goes away if and when the shell dies.)

Capturing command output in shell variable solves all of my problems here. Shell variables don't have any of the issues of temporary files, they let you directly see the exit status of the first command in what would otherwise be the pipeline, and you can repeatedly re-process them through different additional things. I won't say it's entirely elegant, but it works and sometimes that (and simplicity) is my priority.


Comments on this page:

I wonder if you've heard of the directed graph shell? It extends the pipeline syntax to support arbitrary DAGs instead of just pipelines. https://www2.dmst.aueb.gr/dds/sw/dgsh/

I've never actually used it in production though; I just don't write enough shell scripts.

By John Wiersba at 2019-11-27 10:14:08:

I agree with the main sentiments in this post. One small improvement: you shouldn't use echo with unknown data, even if the data is generated by a trusted command. Use printf instead (see Why is printf better than echo?):

printf "%s\n" "$smartout"

As an example of this sort of thing, here's a function I recently wrote to capture the stdout output of a chatty command, but suppress the stderr output, unless the command returns a non-zero exit status.

# send cmd's stdout to stdout; send cmd's stderr to stderr only if cmd fails
stderr_only_if_fail() {
  { err_only=$( "$@" 2>&1 >&3 3>&- ); } 3>&1 && return 0
  printf "%s\n" "$err_only" >&2
  return 1
}
# example usage:
data=$( stderr_only_if_fail chatty_cmd with "some args" ) || exit 1
By jagipson at 2019-12-02 08:59:36:

Possibly even better than printf or echo with a pipe is the Bash feature called Here Strings:

smartout="$(smartctl -A /dev/$dsk)"
if [ $? -ne 0 ] ; then
   ....
fi
<<<"$smartout" tr A-Z- a-z_ | ....
<<<"$smartout" awk '<process again>'

More information can be obtained in the Bash man page. Search for "here strings"

Written on 27 November 2019.
« In Prometheus, don't be afraid of high cardinality metrics if they're valuable enough
Selecting metrics to gather mostly based on what we can use »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Nov 27 00:24:12 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.