Two xargs gotchas that you may not know about

May 1, 2013

I know, I've been harping on xargs a bit lately. But this stuff is important because most people's vague intuitions about how xargs behaves is actually wrong.

If you're like most people, you probably vaguely think that xargs operates on lines of input and the purpose of the GNU -0 extension to xargs (and find et al) is so that some joker putting a newline in a file name doesn't cause the world to blow up. Actually it's much worse than that.

The simple way to put this is xargs doesn't operate on lines, it operates on words. Words are the same as lines only if your lines don't have any whitespace, backslashes, single quotes (') or double quotes ("), all of which xargs will interpret in various ways. Oh, and blank lines are neither errors nor empty arguments under normal circumstances, they are simply word-separating whitespace. In short, newlines are only the beginning of the things that nasty people can put in their filenames to give you heartburn.

(Normally you don't see any of this because your input to xargs is well formed and simple.)

The other trap (as I alluded to) is the portable behavior of xargs if you don't give an explicit -E argument. If you don't, some versions of xargs will assume that a line with only an underscore (_) actually means the (logical) end of file and won't read any further input. It will probably surprise no one that Solaris 10 update 8 (that bastion of old times) behaves this way. Fortunately Linux, FreeBSD, and OpenBSD don't appear to do so.

(One of the morals here is that sometimes GNU programs make important innovations, as I believe that xargs -0 and find ... -print0 came from GNU.)

Comments on this page:

From at 2013-05-02 08:48:15:

For the sake of completeness, it should be noted that "-print0" may be widely adopted, but it's not part of POSIX:

A feature of SVR4's find utility was the -exec primary's + terminator. This allowed filenames containing special characters (especially <newline> characters) to be grouped together without the problems that occur if such filenames are piped to xargs. Other implementations have added other ways to get around this problem, notably a -print0 primary that wrote filenames with a null byte terminator. This was considered here, but not adopted. Using a null terminator meant that any utility that was going to process find's -print0 output had to add a new option to parse the null terminators it would now be reading.

Similarly, "-0" is not not mandated in POSIX's xargs.

Written on 01 May 2013.
« The two stories of RISC
Virtual disks should be treated as 4k 'Advanced Format' drives »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed May 1 23:48:39 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.