2013-04-25
How SuS probably requires the 'run at least once' xargs
behavior
A commentator left a long comment on my entry about how xargs
behaves with no input arguing that the
Single Unix Specification for xargs
actually requires it to not run if standard input is empty. I think
it's more likely to be the other way around, so today I want to run
down why I think the SuS probably requires this annoying behavior.
There are two important sections of the SuS xargs
specification
here and I'm going to quote both, bolding important bits:
The xargs utility shall construct a command line consisting of the utility and argument operands specified followed by as many arguments read in sequence from standard input as fit in length and number constraints specified by the options. The xargs utility shall then invoke the constructed command line and wait for its completion. This sequence shall be repeated until one of the following occurs:
- An end-of-file condition is detected on standard input.
[... other conditions elided ...]
[...] The utility named by utility shall be executed one or more times until the end-of-file is reached or the logical end-of file string is found. [...]
Now we get to play the fun game of interpreting standards. The easiest place to play this game with is the last sentence I quoted, which says both that the utility shall be executed at least once and that this happens until end-of-file is reached. If end of file is reached immediately, which takes precedence? In the style of reading standards that I've absorbed, explicit statements generally trump implications; that would mean that the explicit promise that utility shall be executed at least once trumps the potential implication of not running it on immediate EOF.
The first paragraph as a whole offers a similar conflict. It is easy to
read it as a series of steps: first read in as many arguments as you can
that fit, then run the command, and only then check for exit conditions
and repeat if they are not met. You don't check for exit conditions
before you run the command once because that's not what the series
of steps tells you to do, and 'zero arguments' is not ruled out as a
valid number of arguments to read from standard input; ergo, xargs
runs the command line once even on immediate EOF. You can also read it
as a general description instead of a series of steps, with the 'this
sequence shall be repeated until ...' forming the framing procedure
around the specific two steps used to form and run each command line; in
this reading it's correct to run zero times if there is an immediate end
of file on standard input since the framing loop's exit condition has
been met.
If we read the first paragraph using an 'explicit trumps implicit' rule
then I think that we have to conclude that the paragraph is the set of
steps that xargs
is intended to follow as it executes because this is
exactly how the paragraph is written. This interpretation is reinforced
by the 'once or more' language in the later paragraph.
None of this is unambiguous; the SuS specification never comes out and
says outright 'xargs
runs once even if it reads no arguments'. But
given how much the usual extremely legalistic, 'every word and phrase
and ordering decision counts' approach to reading standards pushes us
towards the 'xargs
runs once on EOF' interpretation, I think it's
probably what SuS actually requires.
(Note that none of this matters in practice. As covered in the first
entry, existing systems have no common behavior.
The closest you can get is to always specify -r
so that xargs
does
not run once, which works on GNU findutils, sufficiently recent FreeBSD,
and OpenBSD.)
PS: this is not the most crazy thing in the SuS xargs
specification.
If you care about xargs
portability and want to be horrified, read
the description of -E
carefully.
(Also, these crazy things are almost certainly not the fault of the SuS authors.)