The sensible way to use Bourne shell 'here documents' in pipelines

April 18, 2018

I was recently considering a shell script where I might want to feed a Bourne shell 'here document' to a shell pipeline. This is certainly possible and years ago I wrote an entry on the rules for combining things with here documents, where I carefully wrote down how to do this and the general rule involved. This time around, I realized that I wanted to use a much simpler and more straightforward approach, one that is obviously correct and is going to be clear to everyone. Namely, putting the production of the here document in a subshell.

(
cat <<EOF
your here document goes here
with as much as you want.
EOF
) | sed | whatever

This is not as neat and nominally elegant as taking advantage of the full power of the Bourne shell's arcane rules, and it's probably not as efficient (in at least some sh implementations, you may get an extra process), but I've come around to feeling that that doesn't matter. This may be the brute force solution, but what matters is that I can look at this code and immediately follow it, and I'm going to be able to do that in six months or a year when I come back to the script.

(Here documents are already kind of confusing as it stands without adding extra strangeness.)

Of course you can put multiple things inside the (...) subshell, such as several here documents that you output only conditionally (or chunks of always present static text mixed with text you have to make more decisions about). If you want to process the entire text you produce in some way, you might well generate it all inside the subshell for convenience.

Perhaps you're wondering why you'd want to run a here document through a pipe to something. The case that frequently comes up for me is that I want to generate some text with variable substitution but I also want the text to flow naturally with natural line lengths, and the expansion will have variable length. Here, the natural way out is to use fmt:

(
cat <<EOF
My message to $NAME goes here.
It concerns $HOST, where $PROG
died unexpectedly.
EOF
) | fmt

Using fmt reflows the text regardless of how long the variables expand out to. Depending on the text I'm generating, I may be fine with reflowing all of it (which means that I can put all of the text inside the subshell), or I may have some fixed formatting that I don't want passed through fmt (so I have to have a mix of fmt'd subshells and regular text).

Having written that out, I've just come to the obvious realization that for simple cases I can just directly use fmt with a here document:

fmt <<EOF
My message to $NAME goes here.
It concerns $HOST, where $PROG
died unexpectedly.
EOF

This doesn't work well if there's some paragraphs that I want to include only some of the time, though; then I should still be using a subshell.

(For whatever reason I apparently have a little blind spot about using here documents as direct input to programs, although there's no reason for it.)


Comments on this page:

From 78.58.206.110 at 2018-04-19 00:05:29:

If you just need grouping without a subshell, { ... } should work. It's called ”brace_group" in sh grammar.

From 193.219.181.219 at 2018-04-19 00:32:32:

...which, on the second thought, doesn't actually reduce the number of shell processes much anyway if it's in a pipeline. (And some shells might optimize away the subshell, too.)

Also, zsh has a neat feature which lets you do (< file) or (<<< "text") or other kinds of redirections without invoking cat – the shell itself does the rest. It doesn't seem to be available in sh or bash however.

This doesn't work well if there's some paragraphs that I want to include only some of the time, though; then I should still be using a subshell.

Wouldn’t you just interleave fmt and cat calls, each with smaller heredocs, in that case? And then put the whole thing in a subshell if necessary. I.e.,

(
    cat <<EOF
    ...
    EOF

    fmt <<EOF
    ...
    EOF
) | sed | whatever

Having written that out, I've just come to the obvious realization that for simple cases I can just directly use fmt with a here document:

I was going to object to the useless use of cat until I got to that part. 😊

Note that you can get rid of both the subshell and the cat by using read and heredocs for your multiline values and then using herestrings to pass them.

read -d '' message <<EOF
your here document goes here
with as much as you want.
EOF

<<< "$message" sed | whatever

(Since you can put redirections anywhere in a command, when I have a pipe, I like to move them to the front.)

This also has the advantage that you can give your literal a name… and the disadvantage that you must give it a name.

By John Wiersba at 2018-04-19 13:46:40:

You can also use shell functions to isolate the heredoc and make your pipeline more readable:

 foo_data() {
   cat <<EOF
 foo
 EOF
 }

 foo_data | sed | fmt

That is nice too. It evaluates the heredoc every time it’s mentioned, and if you need that, it cleanly beats my suggestion. Note that it can also neatly be made into a one-liner:

foo_data() { cat <<EOF ; }
foo
EOF
Written on 18 April 2018.
« A CPU's TDP is a misleading headline number
Spam from Yahoo Groups has quietly disappeared »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Apr 18 23:05:30 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.