The limitations on find's -exec option and implementation convenience

January 31, 2021

In my entry on how find mostly doesn't need xargs nowadays, I noted that in '-exec ... {} +', the '{}' (for the filenames find was generating) had to come at the end. In a comment on that entry, an anonymous commentator noted that this didn't apply to the -exec version that runs a separate command for each filename; with it, you can put the substituted filename anywhere in the command. This appears to be not just a GNU Find feature, but instead a common one and I think it's even required by the Single Unix Specification for find.

(The SUS specification of the -exec arguments only restricts the '+' form to having the '{}' immediately before it. Its specification for the ';' form just allows for a general argument list, and then the text description says a '{}' in the argument list is replaced by the current pathname. This is tricky, as seems usual for the SUS and POSIX.)

This difference between the two forms of -exec is an interesting difference, and it probably exists because of implementation convenience for the '+' form. So let's start from beginning. When you use any form of -exec, find runs those commands via the exec() family of system calls (and library functions), which require a (C) array of the command and the arguments to be passed to them (ie, this is argv for the new command). The implementation of this for the single substitution case of '-exec ... ;' is straightforward: you create and pre-populate an argv array of all of the -exec arguments (and the command), and you remember the index of the '{}' parameter in it (if there is one, it's not required). Every time you actually run the command, you put the current pathname in the right argv slot and you're done.

In the restricted form of multiple substitutions, you can sort of do this too. You create an argv array of some size, populate the front of it with all of the fixed options, and then append each pathname to the end as an additional option, keeping track of the total size of all of the arguments until you need to execute the command to avoid that being too big. When you're done, you reset your 'the next pathname goes here' index back to the starting position, at the end of the fixed options, and repeat.

However, if the '{}' could go anywhere you'd need a more complicated implementation that would have to divide the fixed arguments into two parts, one before and one after the '{}'. You would fill the front of your argv with the 'before' fixed arguments, append pathnames as additional arguments until the total size hit your limit, and then append on the 'after' fixed arguments (if any) before the exec(). This is not much extra work but it is a bit of it, and I have to theorize that it was just enough extra work to push the people implementing the SVR4 version (where this feature first appeared) to pick the restricted form to make their lives slightly more convenient and bug free (since code you don't have to write is definitely bug free).

(I'm sure that this isn't the only area of Unix commands where you can see implementation convenience showing through, but find's contrast between the two versions of -exec is an unusually clear example.)


Comments on this page:

The ';' style also allows for several of the argument strings to be '{}' and all of them will be replaced by the pathname. I don't think I've ever use this for anything practical though.

I think there's probably two reasons for the {}-must-be-last phrasing of the '+' form: the implementation convenience you describe, and to avoid (in at least some cases) usurping the executed utility's possible use of + as a single-character argument on its own command line — as + being special for find was a change from previous behaviour. (See also the last part of the Rationale section on that page.)

Rather to my surprise, the '+' text admits the possibility of implementations allowing more than one '{}' argument too. However in my very quick testing, neither macOS find nor GNU find (on CentOS 8) do this for the '+' style.

By Matt at 2021-02-01 22:48:58:

Don't forget you can have text along side the {}, but as the same argument. For example:

find . -type f -name '*.jpg' -exec mv '{}' '{}.bak' \;
By John Marshall at 2021-02-02 03:27:42:

For the same ease of implementation reasons, SUS leaves it up to implementations whether arguments that are not precisely '{}' with nothing extra are replaced. So you would need to test it on your platforms of interest before relying on this. But thanks for the reminder that both implementations I checked before do support {}.bak etc with the ';' style.

Written on 31 January 2021.
« I wish every program that wanted 'a SQL database' would let me use SQLite
Go 1.16 will make system calls through libc on OpenBSD »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jan 31 22:35:00 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.