Wandering Thoughts archives

2021-01-31

The limitations on find's -exec option and implementation convenience

In my entry on how find mostly doesn't need xargs nowadays, I noted that in '-exec ... {} +', the '{}' (for the filenames find was generating) had to come at the end. In a comment on that entry, an anonymous commentator noted that this didn't apply to the -exec version that runs a separate command for each filename; with it, you can put the substituted filename anywhere in the command. This appears to be not just a GNU Find feature, but instead a common one and I think it's even required by the Single Unix Specification for find.

(The SUS specification of the -exec arguments only restricts the '+' form to having the '{}' immediately before it. Its specification for the ';' form just allows for a general argument list, and then the text description says a '{}' in the argument list is replaced by the current pathname. This is tricky, as seems usual for the SUS and POSIX.)

This difference between the two forms of -exec is an interesting difference, and it probably exists because of implementation convenience for the '+' form. So let's start from beginning. When you use any form of -exec, find runs those commands via the exec() family of system calls (and library functions), which require a (C) array of the command and the arguments to be passed to them (ie, this is argv for the new command). The implementation of this for the single substitution case of '-exec ... ;' is straightforward: you create and pre-populate an argv array of all of the -exec arguments (and the command), and you remember the index of the '{}' parameter in it (if there is one, it's not required). Every time you actually run the command, you put the current pathname in the right argv slot and you're done.

In the restricted form of multiple substitutions, you can sort of do this too. You create an argv array of some size, populate the front of it with all of the fixed options, and then append each pathname to the end as an additional option, keeping track of the total size of all of the arguments until you need to execute the command to avoid that being too big. When you're done, you reset your 'the next pathname goes here' index back to the starting position, at the end of the fixed options, and repeat.

However, if the '{}' could go anywhere you'd need a more complicated implementation that would have to divide the fixed arguments into two parts, one before and one after the '{}'. You would fill the front of your argv with the 'before' fixed arguments, append pathnames as additional arguments until the total size hit your limit, and then append on the 'after' fixed arguments (if any) before the exec(). This is not much extra work but it is a bit of it, and I have to theorize that it was just enough extra work to push the people implementing the SVR4 version (where this feature first appeared) to pick the restricted form to make their lives slightly more convenient and bug free (since code you don't have to write is definitely bug free).

(I'm sure that this isn't the only area of Unix commands where you can see implementation convenience showing through, but find's contrast between the two versions of -exec is an unusually clear example.)

unix/FindExecImplementationShows written at 22:35:00; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.