Why GNU tools are sometimes not my favorite programs

June 22, 2009

Presented in the traditional illustrated form:

; cat a
root
cks
; cat b
root
cks
abc
; comm -13 a b >/dev/null
comm: file 2 is not in sorted order

If your comm doesn't do this, don't be surprised; this behavior is new in the very latest version of comm, from coreutils 7.2 (as installed on Fedora 11; Fedora 10 didn't have it). This behavior is turned off by the new --nocheck-order option, although the manpage contains scary warnings about this not being supported.

Congratulations, GNU coreutils maintainers. You have just broken any number of scripts that were using comm to obtain differences between ordered files; all of these scripts now produce extra output, which is bad. Worse, fixing this will make the scripts unportable, since not even previous versions of GNU comm understand the new --nocheck-order option.

(Yes, yes, technically this behavior is allowed by the Single Unix Specification. But in real life, this is false; the true specification is not whatever is allowed by the letter of standards, it is what everything does and what people write to.)

Also, this is utterly the wrong way to change behavior like this. The correct way is to first introduce the necessary command line switches but not default to emitting a warning, with a note that in X amount of time the default will change. Then several versions later you can start to think about changing the default, since people have had a chance to add the new options to their scripts. (You will still fail, because people don't even look at perfectly working scripts, much less update them, but at least you will have made vague motions towards doing the right thing instead of being an asshole.)


Comments on this page:

From 203.59.102.239 at 2009-06-22 03:04:29:

This change was introduced a while ago:

commit 98a96822d9dac92de719fa340fe326e1fe0427fe Author: Bo Borgerson <gigabo@gmail.com> Date: Sun Apr 20 21:24:16 2008 -0400

Far too long ago to convince them to back it out sadly.

From 65.172.155.230 at 2009-06-22 20:10:07:

I'm curious, why do you use comm instead of cmp there? I must admit I've never used comm, and can't even remember knowing about it.

By cks at 2009-06-22 21:07:54:

This is the brute force solution to a situation where I have a list of items that usually grows by having things added to the end, and occasionally gets truncated. I want to show only the new items, in their natural order, so I can't use sort without getting very creative.

comm is useful in general when you want set operations. It is usually used strictly on sorted files, where it gives you various sorts of set intersections of the two files. (My usage here is thus slightly tricky.)

Written on 22 June 2009.
« Email is intrusive, and why
Solaris 10 NFS server parameters that we change and why »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jun 22 00:30:37 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.