A handy diff argument handling feature that's actually very old

October 7, 2020

Some time ago I stumbled over a useful feature in the diff on our Linux machines (ie, GNU diff), where 'diff exim4.conf /etc/exim4/' is the same as 'diff exim4.conf /etc/exim4/exim4.conf'. As a sysadmin, I routinely diff versions of configuration files to do things like verify that my intended new changes are actually the only changes, so this feature routinely saves me from having to repeat the file name. I was all set to write a Wandering Thoughts entry about how this was a handy GNU diff addition, even if it's not quite pure in the Unix way, and then I decided to check what the Unix standard had to say, just to be sure. To my surprise, the standard's manpage for diff explicitly requires this behavior. Then I looked at the history of diff and got another surprise.

The standard describes it in the "Operands" section, in the usual sort of standards language:

If only one of file1 and file2 is a directory, diff shall be applied to the non-directory file and the file contained in the directory file with a filename that is the same as the last component of the non-directory file.

Once I looked, this diff behavior turned out to go back quite far in Unix history, much further than I thought. This behavior is first specifically mentioned in the V7 diff manpage:

If file1 (file2) is a directory, then a file in that directory whose file-name is the same as the file-name of file2 (file1) is used.

Diff itself seems to appear in V5 Unix (there's no diff manpage in the V4 manuals that tuhs.org has). However, the V5 and V6 manpage don't mention this behavior and the V6 diff source code doesn't seem to contain it on a casual look; it just directly opens the files you gave it and that's it.

(There are Unix V6 emulators online that run in your browser, and trying diff out in one of them suggests that this is how it really works. You can get some odd results, because you can actually read() directories in early Unixes.)

On the one hand, I'm amused and pleased that this handy feature of diff goes as far back as it does, all the way to V7. On the other hand, I wish that I'd noticed it earlier, since it's been there all this time.

(And this is a useful reminder to me that not all of the little nice convenience features found in modern Unix come from GNU.)

Comments on this page:

By karatinversion at 2020-10-07 05:48:26:

I did not know of this feature in diff; I have achieved almost the same reduction in typing with bash brace expansion:

diff {,/path/to/a/}file.ext
By Anon at 2020-12-19 10:35:44:

Do you know what the probability of getting bad data back from an SSD is?

It's well known that HDDs can silently corrupt and return "bad" data (as opposed to returning good data or a "can't read the data" message) and https://queue.acm.org/detail.cfm?id=1866298 has good references to studies investigating this. But what about SSDs? Studies seem to talk about the probability of the SSD dying entirely but it's hard to find information about the probability of an SSD returning corrupt data. It sounds like this is justification Apple used to not put checksums in APFS (https://eclecticlight.co/2020/03/21/should-we-take-bit-rot-seriously/ )...

Written on 07 October 2020.
« Linux distributions have sensible reasons to prefer periodic releases
Sorting out what the Single Unix Specification is and covers »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Oct 7 00:28:56 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.