A shell thing: globbing operators versus expansion operators

September 16, 2016

If you've been using a Unix shell for long, you may be familiar with the '[...]' wildcard, which can be used to match a character range (or a bunch of characters):

ls -lt logfile.[1-5].gz

If you've used Bash or a number of other shells for a while, you may also be familiar with '{..,...}':

touch afile.{one,two,three}

There is an inconvenient chasm here between these two very similar things. Wait, a chasm? Sure. Imagine that you want to create afile.1 through afile.5. Can you write this in a nice compact way as the following?

touch afile.[1-5]

The answer is no, and this is the chasm in action. You can use '[1-5]' to match logfile.1.gz through logfile.5.gz, but you can't use it to generate 1 through 5 for touch. Similarly, you can't use {...} as part of a wildcard match, eg:

ls -lt afile.{c,h,go,cpp,py,rb,el}

What is happening here is that modern shells have two sorts of operators, wildcard globbing operators and expansion operators. Expansion operators are simply text substitution and expansion, so 'x.{a,b,c}' expands out to 'x.a x.b x.c' regardless of what files currently exist. Wildcard globbing operators match filenames and only expands out to filenames that match; if nothing at all matches, it's either an error or the operator produces itself as literal text.

(In other words, if you do 'touch nosuchfile.*', you get a file called 'nosuchfile.*'. The operator producing itself is the standard behavior but some shells have an option to make a failed glob into an error.)

The chasm between the two fundamentally exists because the shell can't read your mind about what you want. To return to my earlier example, if you write:

touch afile.[1-5]

and you already have a file called afile.1, do you actually want to update its timestamp and do nothing else, or you want to create afile.2 through afile.5 as well? The shell can't tell, so it must pick one or the other. It is this decision that creates the distinction between wildcard globbing operators and expansion operators.

(Globbing came first, by the way. Expansion operators got added later, although in the end the Bell Labs people decided that having a '{...}' feature was sufficiently useful that an equivalent was included in Tom Duff's rc.)

(This entry was sparked by Advancing in the Bash Shell, via John Arundel, which got me thinking about how '{..}' is kind of weird in the shell.)


Comments on this page:

By Ewen McNeill at 2016-09-16 02:42:17:

FWIW, you can use {...} as part of a wildcard match, both inside and outside the {...}, at least with bash on Linux and Mac OS X.

ewen@ashram:/tmp/test$ touch afile.{c,cpp,h,go,py,rb,el}
ewen@ashram:/tmp/test$ ls
afile.c        afile.el        afile.h         afile.rb
afile.cpp      afile.go        afile.py
ewen@ashram:/tmp/test$ ls -lt *file.{c*,h,go,py,rb,el}
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.c
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.cpp
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.el
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.go
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.h
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.py
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.rb
ewen@ashram:/tmp/test$ ls -lt *file.{c*,h,go,py,*b,el}
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.c
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.cpp
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.el
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.go
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.h
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.py
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.rb
ewen@ashram:/tmp/test$ ls -lt *file.{[ch]*,go,py}
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.c
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.cpp
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.go
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.h
-rw-r--r--  1 ewen  wheel  0 16 Sep 18:31 afile.py
ewen@ashram:/tmp/test$ 

At least in bash globbing happens on each of the {...} expansions in turn. Obviously the files have to exist, otherwise the globbing will fail; but that's a globbing issue not a {...} issue.

Ewen

By not an illuminati at 2016-09-16 03:02:29:

There is also a range expansion operator:

$ touch {1..5}
$ ls
1  2  3  4  5

Beyond-POSIX shells have equivalent operators across the glob matching and expansion categories.

The {1..5} range expansion equivalent of the [1-5] glob match has been mentioned.

Likewise there is a @(one|two|three) globbing equivalent for the {one,two,three} brace expansion – under shopt -s extglob in bash and setopt kshglob_ in zsh.

So the chasm is nowadays closed, provided you can target a sufficiently-featured shell. The complete lack of any symmetry between syntaxes is irritating, but that’s organic growth for ya…

And thereby follows that the chasm is not fundamental. True, if you only have one syntax for each operation, the shell can’t offer both categories, because it would have to guess, and it can’t. But there’s no reason it can’t have two syntaxes for each operation, one in each category, allowing the user to indicate which is wanted, and thus offering both.

D’oh, that was supposed to be setopt ksh_glob.

Written on 16 September 2016.
« What I did to set up IPv6 on my wireless network
What encoding the syslog module uses in Python 3 »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Sep 16 01:05:15 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.