One irritation in xargs's interface

March 30, 2013

Xargs is generally a nice command that more or less works right. Some people could criticize Unix for needing it so much (which is mostly a product of command line length limitations) and the need for -0 is a bit annoying, but on the whole it's good. But xargs has one little corner case that is really annoying; as a bonus, it's even non-portable in an irritating way.

Here it is, presented in illustrated form:

$ xargs echo does run </dev/null

Now the question: will this produce any output? In other words, does xargs run the command once even if there are no (extra) arguments to give to it? The answer is that it does in some but not all versions of xargs:

  • Solaris 10 runs echo once and has no option to disable this.
  • GNU findutils xargs (commonly used on Linux) normally runs echo once but can turn this off with -r aka --no-run-if-empty.
  • FreeBSD doesn't run echo and has no option to change this. Recent versions accept -r for compatibility with GNU xargs; old versions don't.
  • OpenBSD runs echo once but can turn this behavior off with -r.
  • Mac OS X doesn't run echo and has no -r argument.

Based on the current manpage, NetBSD xargs behaves the same as FreeBSD xargs (including accepting a do-nothing -r argument).

The Single Unix Specification for xargs is rather ambiguous about what behavior is allowed or required; it certainly never definitely states things either way (and it has no -r argument). My close reading leads me to believe that SuS probably requires xargs to run echo once, but only by implication. This would match what I believe is historical behavior (as suggested by Solaris, which is very historical). I assume that at some point FreeBSD decided that this historical behavior was a bad idea and changed it.

My view is that (historical) xargs behavior is stupid and is a bear trap waiting to bite you in unusual situations. You almost never want to run the xargs command even if there is nothing for it to operate on. In many situations and usages you'll get odd results if there is nothing to operate on; in extreme cases you may get dangerous explosions. This is an easy issue to overlook because everyone almost always uses xargs in situations that do generate arguments list (especially when you're testing your command lines or scripts). In fact I suspect that many people using xargs on Linux, Solaris, and OpenBSD machines don't even know about this potential gotcha, which sort of proves my point.

(This entry is yet another illustration of how a simple entry idea can turn out much more interesting than I expected when I started writing it. Before I started actually checking systems I would have confidently told you that all versions of xargs would run echo once; I had no idea how tangled the actual situation was.)


Comments on this page:

From 84.163.44.45 at 2013-03-30 06:38:06:

Reading from /dev/null should always return EOF (POSIX standard).

No echo is correct!

See: http://en.wikipedia.org/wiki//dev/null

From 71.92.228.40 at 2013-03-30 10:24:42:

I would also be interested in knowing if the various versions, given an input which exactly filled their idea of how much they could process in a single run, did an extra run with no extra arguments.

I agree that given no input you should get no runs.

By cks at 2013-03-30 13:37:52:

@84.163.44.45: The question is not what reading from /dev/null does, the question is what xargs does in the face of no input. Reading from /dev/null is only the most convenient way of producing this in a demo. The practical answer is that it varies between Unixes, as my entry says. If you feel that the Single Unix Specification provides an unambiguous answer as to what xargs should do in this situation, I would be interested in your argument.

@71.92.228.40: It's an interesting question. I suspect but have not tested that all versions of xargs run only once, but unfortunately it's hard to do a thorough test (a quick one with a very low xargs -n value suggests that both Solaris 10 and GNU xargs get it right).

From 84.163.27.148 at 2013-04-24 10:38:34:

@cks:

As mandated by The Single Unix Specification for xargs (you linked to in your post) xargs' first course of action is to read from its standard input stream to actually construct the (first) argument list internally and only THEN use the constructed argument list for execvp'ing the specified utility for the first time ("The xargs utility shall THEN invoke the constructed command line and ..."). Reading from /dev/null right from the start, however, actually prevents reading from stdin at all; it's like invoking xargs with EOF as stdin, i.e. without any kind of stdin at all. So, in order to be POSIXly correct, xargs has to detect an end-of-file condition on its standard input right from the start prior to invoking specified utility and immeadiatly react to EOF on stdin by exiting. In this case the repeat sequence mentioned in the specification has to be set to 0 (zero).

For a non-existing (pre-exhausted) stdin argument list the utility specified for xargs should not be invoked at all.

x != 0 in xargs.

(See, for example, the code following /* No arguments since last exec. */ in http://opensource.apple.com/source/shell_cmds/shell_cmds-170/xargs/xargs.c ).

A single byte (such as a newline character), though, should at least trigger the invocation of utility once (which is not the case using FreeBSD / Mac OS X xargs; GNU parallel works).

xargs eats newlines!

echo | xargs echo does run

echo $'\n\n' | xargs echo does run

echo | parallel echo does run

echo $'\n\n' | parallel echo does run

By cks at 2013-04-25 01:18:40:

I believe that you're wrong about the implications of reading from /dev/null. /dev/null is no different from any other sort of standard input that presents an immediate end of file, and all of these are not at all the same as having no standard input. To put it one way, one is a zero-byte return from read() and the other is a read error of EBADF. In all of the former cases you can read from standard input but the result is empty.

The xargs eats blank lines case (on FreeBSD and Mac OS X) is because xargs is not doing what you think it is. The baseline xargs behavior is to read words from standard input, not lines, and words are non-blank (unless quoted). Blank lines are word-separating whitespace and so are ignored. By contrast, GNU parallel explicitly operates on lines of input.

(The easiest way to see that this is the case is to run 'echo a b | xargs -n 1 echo hi'. This will run the echo twice, not once.)

The 'what the SuS probably requires and why' portion of my reply got so long that I put it in a new entry, XargsZeroArgsIssueII. In addition, I can't see any sign in the text that SuS treats an immediate EOF specially or any differently from 'no argument words read from standard input for whatever reason'.

Written on 30 March 2013.
« Illumos-based distributions are currently not fully mature
Can we really use the cloud? »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Mar 30 01:44:47 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.