Unix options conventions are just that, which makes them products of culture

August 9, 2020

Recently I wrote about my views on some conventions for Unix command line options, where I disagreed in part with what Chris Wellons considered Conventions for Command Line Options. Both Wellons and I have a lot of Unix experience, and that we disagreed on parts of what rightfully should be a well established core of Unix in practice shows some things about them.

The first thing to note about Unix's conventions on options is that they've always been ad hoc and imperfectly adhered to, to the extent that they even existed in the first place. To start with, V7 Unix did not have getopt(3), so every V7 program did its own parsing of options and I'm certain that some of them had somewhat different behavior. Several V7 programs completely broke these conventions; famously dd doesn't even use conventional options that start with a '-', and while find has options that start with '-' they're actually more like GNU style long options.

(Wikipedia implies that getopt(3) first appeared in System III, and indeed here's the System III getopt(3) manpage, dating from 1980 (cf, also).)

The second thing is that both Wellons and I can go on about conventions all we want (and what they should be), but the reality is that the 'conventions' that exist are defined by what programs actually do. If a lot of programs (or a popular option parsing library) behave in a particular way, in practice that is the convention regardless what I think of it (or write). The corollary of this is that what people consider convention is in large part defined by how the programs they use behave. By its mere existence and popularity, GNU Getopt has defined a lot of the modern conventions for options handling; if you deviate from it, you will surprise people who expect your programs to behave like the other programs they use every day. Before GNU Getopt was what most programs used, getopt(3) did the same thing and had the same effect for the conventions it enforced.

(New options parsing libraries tend not to break too much with the current convention when they were initially written, but they can selectively change some of them, especially what are considered more obscure ones.)

Finally, I suspect that part of the difference between Wellons' view of these conventions and mine is because of when we came into Unix. I started using Unix long enough ago that it was in the era of classic getopt(3) instead of GNU Getopt (and long options), so the rules that getopt(3) enforced were the ones that I wound up internalizing as the correct conventions. Someone who came into Unix later would have been primarily exposed to GNU Getopt's somewhat different behavior, with it supporting intermixed options and non option arguments, long options being routine, and so on.

The corollary of this is that people who come into Unix today are learning the conventions as they stand now, including any inconsistencies between Unix programs that are increasingly written in different languages, with different argument parsing libraries, and so on. Some languages are sufficiently divergent that no one is going to mistake them for 'how Unix commands should behave' (I'm looking at you, Go), but others are close enough that people are likely to internalize parts of their behavior, even if only as expected divergences and differences, just as people remember find and its unusual behavior.


Comments on this page:

By Anonymous at 2020-08-10 10:38:34:

I've always found it slightly strange that the 'tar' command does not require the '-' for the options, so you can do things like 'tar xvf' and just omit it. I've managed to confuse some co-workers that 'corrected' me when they saw me omitting it, and then were amazed when it actually worked that way. I guess that tar was written before the whole '-' prefix convention was in place, but never bothered to verify that assumption.

By Todd at 2020-08-11 15:04:11:

There's extensive discussion of tar history and format in this blog post about container images. The author suggests that tar's lack of dashes for options is due to an attempt to maintain backward compatibility with what came before it, namely tp and tap. Under the "Genesis" heading on that page, he says:

tar first originated in Unix v7. Curiously, it was not the first archiving tool available for Unix. tar was a successor to tp (Unix v4), which itself was a successor to tap (Unix v1). As a complete aside, this appears to be the reason why tar accepts dash-less arguments (such as tar xvf). Unix v1 didn’t have dashed argument flags like -xvf (as far as I can tell from the man pages), and tar appears to have been backwards-compatible with tp (which was backwards-compatible with tap). Therefore the most likely reason why tar supports dash-less arguments is because some folks in the 70s wanted to be able to alias tap=tp tp=tar and it’s stuck ever since.

By Edward Berner at 2020-08-12 00:44:28:

I checked some books for historical commentary about getopt and was surprised to not find much, but did find this bit from "The Unix Programming Environment" by Kernighan and Pike, Copyright 1984:

On page 14:

Specifying options by a minus sign and a single letter, such as -t or the combined -lt, is a common convention. In general, if a command accepts such optional arguments, they precede any filename arguments, but may otherwise appear in any order. But UNIX programs are capricious in their treatment of multiple options. For example, standard 7th Edition ls won’t accept ls -l -t as a synonym for ls -lt, while other programs require multiple options to be separated.

As you learn more, you will find that there is little regularity or system to optional arguments. Each command has its own idiosyncrasies, and its own choices of what letter means what (often different from the same function in other commands). This unpredictable behavior is disconcerting and is often cited as a major flaw of the system. Although the situation is improving -- new versions often have more uniformity -- all we can suggest is that you try to do better when you write your own programs, and in the meantime keep a copy of the manual handy.

And on page 179:

In Chapter 1 we remarked on the disorderly way that UNIX programs handle optional arguments. One reason, aside from a taste for anarchy, is that it’s obviously easy to write code to handle argument parsing for any variation. The function getopt(3) found on some systems is an attempt to rationalize the situation; you might investigate it before writing your own.

By Anonymous at 2020-08-12 14:49:42:

Hi, Todd. I'm not sure if you'll read this, but thank you so much for posting the link to that article. I never knew the history of 'tar' was this complex. It was a very educational and enjoyable read. And although I knew about the existence of 'pax' (I read some of the POSIX standards for a while for job and interest reasons), I never investigated the reasons for it's existence. And tp/tap/star/ustar were completely new to me. Thanks again !

By msi at 2020-08-19 08:01:02:

The second thing is that both Wellons and I can go on about conventions all we want (and what they should be), but the reality is that the 'conventions' that exist are defined by what programs actually do. If a lot of programs (or a popular option parsing library) behave in a particular way, in practice that is the convention regardless what I think of it (or write). The corollary of this is that what people consider convention is in large part defined by how the programs they use behave.

The critical question then is what to make of this situation when you're implementing a command-line interface. I'd choose following well-defined, reasonable standards first and going by popular convention second. In the Unix field, this could mean (and does for me): Follow the POSIX Utility Syntax Guidelines and, optionally, those parts of the GNU CLI standard that don't violate the former. E.g., providing a GNU-style long option for every POSIX-style option a program offers would be fine while offering some functionality solely through long options wouldn't. Looking at popular convention would then be helpful in deciding which combination of hyphen and alphanumeric character (not) to use for providing certain functionality.

By Anonymous at 2020-08-19 13:45:28:

E.g., providing a GNU-style long option for every POSIX-style option a program offers would be fine

GNU long options, by definition, break the POSIX standard. For GNU long options, '--' is the prefix to the long options. In POSIX, '--' defines the end of all options on that commandline, and any following arguments should be treated as operands, even if they begin with the '-' character.

GNU long options, by definition, break the POSIX standard.

They don't, actually. Here's the relevant portion from section A.12.2 of the Rationale volume:

The standard permits implementations to have extensions that violate the Utility Syntax Guidelines so long as when the utility is used in line with the forms defined by the standard it follows the Utility Syntax Guidelines. Thus, head -42 file and ls --help are permitted extensions. The intent is to allow extensions so long as the standard form is accepted and follows the guidelines.

By msi at 2020-08-19 18:44:28:

For GNU long options, '--' is the prefix to the long options. In POSIX, '--' defines the end of all options on that commandline, and any following arguments should be treated as operands, even if they begin with the '-' character.

Well, the POSIX Utility Syntax Guidelines define '--' as the end-of-options indicator whenever it is an argument of its own. That is not the case in a GNU-style long option because the '--' there is immediately followed by a keyword.

I should re-phrase that one sentence about POSIX and GNU from my initial comment, though: Follow the POSIX Utility Syntax Guidelines and, optionally, provide GNU extensions that POSIX permits, on top of that.

By Anonymous at 2020-08-20 07:24:40:

@msi:

Hrm. It looks like you are correct. So either my memory is failing me, or POSIX has changed the spec since the last time I looked at it (which admittedly was ages ago) in order to allow GNU longopts (which makes sense).

Written on 09 August 2020.
« More problems with Fedora 31 DNF modules and package updates
Disabling DNF modules on Fedora 31 so they don't mess up package updates »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Aug 9 22:49:55 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.