Accumulating a separated list in the Bourne shell

February 15, 2019

One of the things that comes up over and over again when formatting output is that you want to output a list of things with some separator between them but you don't want this separator to appear at the start or the end, or if there is only one item in the list. For instance, suppose that you are formatting URL parameters in a tiny little shell script and you may have one or more parameters. If you have more than one parameter, you need to separate them with '&'; if you have only one parameter, the web server may well be unhappy if you stick an '&' before or after it.

(Or not. Web servers are often very accepting of crazy things in URLs and URL parameters, but one shouldn't count on it. And it just looks irritating.)

The very brute force approach to this general problem in Bourne shells goes like this:

tot=""
for i in "$@"; do
  ....
  v="var-thing=$i"
  if [ -z "$tot" ]; then
    tot="$v"
  else
    tot="$tot&$v"
  fi
done

But this is five or six lines and involves some amount of repetition. It would be nice to do better, so when I had to deal with this recently I looked into the Dash manpage to see if it's possible to do better with shell substitutions or something else clever. With shell substitutions we can condense this a lot, but we can't get rid of all of the repetition:

tot="${tot:+$tot&}var-thing=$i"

It annoys me that tot is repeated in this. However, this is probably the best all-around option in normal Bourne shell.

Bash has arrays, but the manpage's documentation of them makes my head hurt and this results in Bash-specific scripts (or at least scripts specific to any shell with support for arrays). I'm also not sure if there's any simple way of doing a 'join' operation to generate the array elements together with a separator between them, which is the whole point of the exercise.

(But now I've read various web pages on Bash arrays so I feel like I know a little bit more about them. Also, on joining, see this Stackoverflow Q&A; it looks like there's no built-in support for it.)

In the process of writing this entry, I realized that there is an option that exploits POSIX pattern substitution after generating our '$tot' to remove any unwanted prefix or suffix. Let me show you what I mean:

tot=""
for i in "$@"; do
  ...
  tot="$tot&var-thing=$i"
done
# remove leading '&':
tot="${tot#&}"

This feels a little bit unclean, since we're adding on a separator that we don't want and then removing it later. Among other things, that seems like it could invite accidents where at some point we forget to remove that leading separator. As a result, I think that the version using '${var:+word}' substitution is the best option, and it's what I'm going to stick with.


Comments on this page:

By Albert at 2019-02-16 04:22:52:

I've been doing it as follows since about forever:

sep=
tot=

for i in "$@"; do
  tot="${tot}${sep}${i}"
  sep="&"   # or whatever
done

Another option (if the separator is a single character) is to fiddle with IFS, the input field separator, which is also interposed in “$*” like this:

$ amper() { local IFS='&'; echo "$*"; }
$ amper a b c
a&b&c

Like most shelly things it has annoying limitations and is tricky to use, so by this point I am replacing the script with a better programming language :-)

By MihaiC at 2019-02-16 07:45:31:

I use 'paste' when this situation comes up. If the things you want to join are on individual lines, it works directly, for example:

seq 1 5 | paste -sd '&'

For array elements you can do force them on individual lines, for example:

for i in "$@" ; do echo "$i" ; done | paste -sd '&'

Unfortunately paste only supports single chars as delimiter, so if you need more you have to do some post-processing, for example:

grep_pattern="$( echo "$newline_delimited_list_of_patterns" | paste -sd '|' | sed 's/|/\\|/g' )"
grep "$grep_pattern" my_file.txt

If you are willing to define something like

chop-first () { printf '%s' "${1:1}" ; }

then you can say

tot=$( chop-first $( printf '&var-thing=%s' "$@" ) )

This uses shell builtins only. If you don’t care about that, then of course you can just pipe the printf output to cut -c 2- and skip defining the extra function.

(If you need to do something different for each parameter, then the simplest approach is Albert’s (a sep variable that gets initialised at the end of the loop body). I’ve used that in many languages, it’s my fallback when a problem makes using the environment’s standard join feature entirely too awkward. (You don’t even need mutable variables: the same technique works when looping by recursion, by having a passed-in separator, which is the empty string in the initial call but a constant in recursive calls. So this technique is truly universal.))

By P Kern at 2019-02-17 15:55:44:

Albert's solution has been in use here forever, also. Though mostly in awk scripts rather than shell scripts. One of those many handy tricks/methods enshrined in the scripts passed down from the wise men who used roam our halls.

Written on 15 February 2019.
« A pleasant surprise with a Thunderbolt 3 10G-T Ethernet adapter
Some notes on heatmaps and histograms in Prometheus and Grafana »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Feb 15 23:12:33 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.