Wandering Thoughts

2020-10-08

Sorting out what the Single Unix Specification is and covers

I've linked to the Single Unix Specification any number of times, for various versions of it (when I first linked to it, it was at issue 6, in 2006; it's now up to a 2018 edition). But I've never been quite clear what it covered and didn't cover, and how it related to POSIX and similar things. After yesterday's entry got me looking at the SuS site again, I decided to try to sort this out once and for all.

My primary resources on this is the Wikipedia page (the SuS FAQ claims to be updated recently but is clearly out of date in important respects). Also useful is the page of the Austin Commons Standards Revision Group (also). The Wikipedia page has a helpful rundown of the history of the 'Single Unix Specification' and some things related to them.

As stated by various places, the core of the Single Unix Specification is POSIX, which is formally an IEEE standard and also an international ISO/IEC standard (IEEE 1003 and ISO/IEC 9945 respectively). POSIX incorporates by reference some vintage of ANSI C (I believe C99), since the Unix APIs it specifies are specified in C. The POSIX standard covers both C library APIs, commands that are executed through the shell (which is also specified in POSIX), and I believe things like some file paths. As far as I can tell, the only other standard in the Single Unix Specification is CURSES, which is not part of POSIX.

(See eg here Unix standards, the FAQ, and Wikipedia.)

This implies that if a Unix command or a non-Curses API is in the Single Unix Specification, it's also in POSIX. This matches what I've seen in the online Single Unix Specification that I keep linking to bits of; I've only ever noticed it talking about POSIX (aka IEEE 1003.1). For most purposes, then, I can just talk about 'POSIX' or 'Single Unix Specification' interchangeably, which is somewhat different than how I used to think it was.

(I originally thought that the SuS was a superset of POSIX that added significant extra commands and requirements that were not in POSIX. This appears to not be the case.)

Sidebar: Where my misunderstanding of SuS came from

How I thought the story went was that POSIX was a relatively minimal standard for 'Unix' that did not go far enough in practice, for various political reasons. This caused actual Unix vendors to get together and agree on an additional layer of things on top of POSIX that made up 'Unix in practice', creating the Single Unix Specification. Systems that were in no way Unix derived could be POSIX compliant if they tried (and so could be candidates for US government contracts that required 'POSIX', per the origins of POSIX as I learned them), but could not be Unix™, which was something that was defined by the Single Unix Specification.

Obviously this is not actually the case, or at least is not the case in modern versions of the SuS. This goes to show me, once again, the power of folklore (especially since I fell for it).

SingleUnixSpecificationWhat written at 00:38:03; Add Comment

2020-10-07

A handy diff argument handling feature that's actually very old

Some time ago I stumbled over a useful feature in the diff on our Linux machines (ie, GNU diff), where 'diff exim4.conf /etc/exim4/' is the same as 'diff exim4.conf /etc/exim4/exim4.conf'. As a sysadmin, I routinely diff versions of configuration files to do things like verify that my intended new changes are actually the only changes, so this feature routinely saves me from having to repeat the file name. I was all set to write a Wandering Thoughts entry about how this was a handy GNU diff addition, even if it's not quite pure in the Unix way, and then I decided to check what the Unix standard had to say, just to be sure. To my surprise, the standard's manpage for diff explicitly requires this behavior. Then I looked at the history of diff and got another surprise.

The standard describes it in the "Operands" section, in the usual sort of standards language:

If only one of file1 and file2 is a directory, diff shall be applied to the non-directory file and the file contained in the directory file with a filename that is the same as the last component of the non-directory file.

Once I looked, this diff behavior turned out to go back quite far in Unix history, much further than I thought. This behavior is first specifically mentioned in the V7 diff manpage:

If file1 (file2) is a directory, then a file in that directory whose file-name is the same as the file-name of file2 (file1) is used.

Diff itself seems to appear in V5 Unix (there's no diff manpage in the V4 manuals that tuhs.org has). However, the V5 and V6 manpage don't mention this behavior and the V6 diff source code doesn't seem to contain it on a casual look; it just directly opens the files you gave it and that's it.

(There are Unix V6 emulators online that run in your browser, and trying diff out in one of them suggests that this is how it really works. You can get some odd results, because you can actually read() directories in early Unixes.)

On the one hand, I'm amused and pleased that this handy feature of diff goes as far back as it does, all the way to V7. On the other hand, I wish that I'd noticed it earlier, since it's been there all this time.

(And this is a useful reminder to me that not all of the little nice convenience features found in modern Unix come from GNU.)

DiffOldArgumentsFeature written at 00:28:56; Add Comment

2020-09-30

People still use newgrp (to my surprise)

One of the things that came out of me writing about why newgrp exists and casually mentioning that it was a relic (and that OpenBSD had gotten rid of it) is that a number of people left comments discussing how they actively use newgrp even today. This surprised me more than a bit, although in retrospect it probably shouldn't have.

There's long been an aphorism in software development that if you have a large and complicated program (such as a word processor), 80% of the people only use 20% of its features but they all use a slightly different 20% (and so if you make a simple version with 20% of the features, many more people will be uninterested in it than you expect). Unix is not a single program, but it is large and complicated and so it's subject to much the same dynamics. I may use only 20% of the programs on my Unix and you may use only 20% of the ones on yours, but our collections probably don't entirely overlap.

(Sometimes this surfaces in those 'share your top N most frequently used shell commands' threads, where you'll find a surprising diversity in all the answers.)

The corollary to this is that you can find some people (still) using almost any somewhat obscure Unix command with useful features, and of course that includes newgrp. This is true even if you solve the problems they use the command for in entirely different ways (and even if your way is probably objectively better; sometimes it's not worth the bother of learning new things). Among other things, people's use of Unix commands is influenced by their history with Unix, and everyone has taken a somewhat different path there.

(I can and do say various things about OpenBSD, but one of their positive sides is their anti-fossilization stance. OpenBSD will remove things, like newgrp, even if people are still using them. So will FreeBSD, as it turns out, since in a default FreeBSD installation newgrp is not setuid and as a result doesn't do anything unless you're root.)

Sidebar: What I use instead of newgrp

The two big uses for newgrp that came up in comments on the original entry were taking on newly added group permissions (when you've just been added to a group in /etc/group) without logging out and then back in again, and insuring that new files get created in the right group.

The latter is the easy case. Any time that we actually care what group new files and subdirectories wind up in on our Linux systems, we make the directory setgid so that everything under it inherits whatever group it has. As I discovered before, on FreeBSD you don't have a choice, and this traditional BSD behavior actually makes plenty of sense.

(I haven't checked OpenBSD but I suspect it behaves the same way as FreeBSD.)

My less satisfying answer is for the the 'just added to a new group' case. This doesn't come up for me very often, and when it does I do wind up logging out and then back in again. I may defer having to do a session restart for a while by various mechanisms like sshing to my machine again or using 'su --login', but eventually there are enough irritations that I want to do a session restart. For instance, things started from my window manager (including new terminal windows) won't be in the new group, so I'll have to keep doing whatever it is to access the group.

(It helps that I have gotten session restarts to be relatively low annoyance things, partly in self defense.)

NewgrpSurprisingUsage written at 23:29:20; Add Comment

How the Unix newgrp command behaved back in V7 Unix

In why newgrp exists (sort of), we discovered that Unix groups can have passwords, that there is a newgrp command that lets you change your (primary) group, and that it dates back to V6 Unix, much earlier than I expected, and that how the V7 newgrp command behaved was interesting. V7 Unix source code (and manual pages and more) is online through tuhs.org, so we can actually read the V7 newgrp.c source code directly.

(We can also read the V6 source code, but it's much more complicated so I'm going to use V7. Besides, V7 is what everyone generally thinks of as the birth point of modern Unix; everything before V7 is at least a bit odd and different.)

How I expected newgrp to behave with password protected groups is that anyone could say 'newgrp GROUP', and if they knew the password they were put into the group without having to be listed as a (potential) member in /etc/group. This may be how your local Unix's newgrp command works today, but it is not how the V7 newgrp did. Let's start with an extract from the newgrp.1 manpage:

A password is demanded if the group has a password and the user himself does not.

When most users log in, they are members of the group named `other.' Newgrp is known to the shell, which executes it directly without a fork.

First, unless you were newgrping to the special group "other", you had to be listed in the group's /etc/group entry whether or not the group had a password. Then, as the manpage describes, a group password was only enforced if your own Unix account didn't have one. If your account did have a password, the group password was ignored. In other words, you could not use group passwords to give everyone potential access to a group if they knew its group password; the group password was only used to protect groups from unpassworded, open access logins (if you had any, and if you listed them in an additional group).

When dealing with groups with passwords, the normal Linux version of newgrp retains the V7 behavior for password protected groups that you're listed as a member of, but also allows you to newgrp to any group with a group password if you know it (according to its manpage, I haven't tested it). The FreeBSD version I have access to specifically discourages use of group passwords and says that newgrp is usually installed without the setuid bit (cf); it claims to only ask for a password if you're not listed as a member of the group.

(I believe this means that in a default setup the FreeBSD newgrp can't be used to change yourself into a group that you've just been added to in /etc/group.)

You might wonder about the last sentence of the V7 manpage bit that I quoted. The reason for this particular hack is to not leave an extra shell process around that you probably don't want (and that would be using up extra resources on the small machines that V7 ran on). Conceptually, if you're running newgrp you generally want your current shell session to be switched to the new group. If the shell just ran newgrp as a regular command, it would switch groups for its own process and then exec /bin/sh to give you a new shell in the new group; it would be like you'd run sh in your login shell. If you repeatedly newgrp'd, you'd wind up with a whole stack of shells.

So instead the V7 Bourne shell specially recognizes newgrp (along with login) and doesn't fork to run it; instead it directly execs into newgrp, replacing your login shell with newgrp (which will then ideally replace it with another shell, and indeed the V7 newgrp tries to always wind up exec'ing a new shell).

(This magic is hiding in the combination of msg.c and xec.c; see the handling of SYSLOGIN.)

NewgrpV7Behavior written at 00:40:50; Add Comment

2020-09-29

Why the Unix newgrp command exists (sort of)

Recently in the Fediverse, I read this toot:

Did you know that #Unix groups have passwords? Apparently if you set one, you then have to use newgrp to log in to that group.

I have never seen anyone use unix group passwords.

(Via @mhoye.)

There are some things to say about this, but the first thing you might wonder is why the newgrp command exists at all. The best answer is that it's mostly a Unix historical relic (or, to put it another way, a fossil).

In basically all current Unixes, processes can be in multiple groups at once, often a lot of them. However this is a feature added in BSD; it wasn't the case in the original Research Unixes, including V7, and for a long time it wasn't the case in System V either. In those Unixes, you could be listed as a member of various groups in /etc/groups, but a given process was only in one group at a time. The newgrp command was how you switched back and forth between groups.

In general, newgrp worked in the way you'd expect, given Unix. It was a setuid root program that switched itself into the new group and then exec'd into your shell (after carefully dropping its setuid powers).

(The actual behavior of newgrp in V7 is an interesting topic, but that's for another entry.)

As far as I can tell from tuhs.org, a newgrp command appears in Research Unix V6, but it doesn't seem to be in V5. You could have written one, though, as there was a setgid() system call at least as far back as V4 (and V4 may be where the idea of groups was invented). Somewhat to my surprise, the existence of group passwords also dates back to V6 Unix.

(Before I started looking into this, I would have guessed that group passwords were added somewhere in the System III/System V line of AT&T Unix, as AT&T adopted it to 'production' usage.)

PS: I'm pleased to see that OpenBSD seems to have dropped the newgrp command at some point. Linux and FreeBSD both continue to have it, and I can't imagine that Illumos, Solaris, or any other surviving commercial Unixes have gotten rid of it either.

NewgrpCommandWhy written at 19:39:23; Add Comment

2020-09-13

I'm now a user of Vim, not classical Vi (partly because of windows)

In the past I've written entries (such as this one) where I said that I was pretty much a Vi user, not really a Vim user, because I almost entirely stuck to Vi features. In a comment on my entry on not using and exploring Vim features, rjc reinforced this, saying that I seemed to be using vi instead of vim (and that there was nothing wrong with this). For a long time I thought this way myself, but these days this is not true any more. These days I really want Vim, not classical Vi.

The clear break point where I became a Vim user instead of a Vi user was when I started internalizing and heavily using Vim's (multi-)window commands (also). I started this as far back as 2016 (as signalled by this entry), but it took a while before I really had the window commands sink in and habits regarding them become routine (like using 'vi -o' on most occasions when I'm editing multiple files). I'm not completely fluid with Vim windows and I certainly haven't mastered all the commands, but at this point I definitely don't want to go back to not having them available.

(In my old vi days, editing multiple files was always a pain point where I would start reaching for another editor. I just really want to see more than one file on a screen at once in my usual editing style. Sometimes I want to see more than one spot in a file at the same time, too, especially when coding.)

I also very much want Vim's unlimited undo and redo, instead of a limited size undo. There are a bunch of reasons for this, but one of them is certainly that the Vi command set makes it rather easy to accidentally do a second edit operation as you're twitching around before you realize that you actually want to undo the first one. This is especially the case if your edit operation was an accident (where you hit the wrong keys by mistake or didn't realize that you weren't in insert mode), or if you've developed the habit of reflexively reflowing your current paragraph any time you pause in writing.

(There are probably other vim features I've become accustomed to without realizing it or without realizing that they're Vim features, not basic Vi features. Everywhere I use 'vi', it's really Vim.)

Although I'm now unapologetically using vim, my vimrc continues to be pretty minimal and is mostly dedicated to turning things off and setting sensible (ie modern) defaults, instead of old vi defaults. I'm unlikely to ever try to turn my vim into a superintelligent editor for reasons beyond the scope of this entry.

(I do use one Vim plugin in some of my vim setups, Aristotle Pagaltzis' vim-buftabline. I would probably be more enthused about it if I edited lots of files at once in my vim sessions, but usually I don't edit more than a couple at once.)

VimNowAUser written at 23:51:47; Add Comment

2020-08-28

My divergence from 'proper' Vim by not using and exploring features

I've read a reasonable number of Vim tutorials and introductions by now, and one of the things that stands out is how some of what I do differs from what seems to be considered 'proper' Vim. The simple way to put it is that I use less of Vim's features than the tutorials often introduce. One of the best examples is something that I do all of the time, which is reflowing paragraphs.

The official proper Vim way to reflow paragraphs (based on tutorials I've read) is gq{motion}. Often the most flexible version is gqip or gqap (where 'ip' or 'ap' select the paragraph you're in). Assuming that various things are set correctly, this will magically reflow your paragraph, much as M-q does in Emacs (a command I'm accustomed to using there).

However, for various reasons I don't use this; instead I rely on the general purpose hammer of '!' and the (relatively) standard Unix fmt command. My conditioned reflex sequence of commands for formatting the paragraph I'm writing is 'ESC { !}fmt }', and in general I'll use '!}fmt' more or less reflexively.

At one level this is somewhere between a curiosity and a deliberate choice not to learn all of Vim and try to Vim golf everything in sight (a choice that I've written about before). At another level this is kind of a weakness. As an example, in writing this entry I discovered not just that the gq command could be made to use fmt, but also discovered or re-discovered the ip and ap motion modifiers, which might be useful periodically, including in my usual paragraph reflowing.

Or perhaps not, because now that I experiment with it, using ip instead of moving to the start of the paragraph causes the cursor to jump up to the start after the paragraph is reflowed. Using an explicit { command means that I'm (relatively) conscious that I'm actively moving before I reflow, instead of having the cursor jump. If Vim was Emacs, I probably wouldn't mind, but since Vim is Vim I think I may prefer the explicitness of my current approach.

(And on character golfing, using ip or ap saves no characters in this situation. To really golf, I would need to switch to gq.)

As before, I probably shouldn't be surprised. Vim's sets of commands and motions are now really quite large, and people generally pick and choose what they use out of large sets like that. I suspect that plenty of Vim users use only various subsets of them, subsets that would strike other Vim users as annoyingly inefficient or old-fashioned.

VimNotUsingFeatures written at 23:59:37; Add Comment

2020-08-17

Important parts of Unix's history happened before readline support was common

Unix and things that run on Unix have been around for a long time now. In particular, GNU Readline was first released in 1989 (as was Bash), which is long enough ago for it (or lookalikes) to become pretty much pervasive, especially in Unix shells. Today it's easy to think of readline support as something that's always been there. But of course this isn't the case. Unix in its modern form dates from V7 in 1979 and 4.2 BSD in 1983, so a lot of Unix was developed before readline and was to some degree shaped by the lack of it.

(This isn't to say that GNU Readline and Bash were the first sources of readline style editing, command completion, and so on; on Unix they go back at least as far as 1983, with tcsh. But tcsh wasn't pervasive for various reasons.)

One obvious thing that was shaped by the lack of readline was csh. Csh has a sophisticated set of operations on your command history that are involved through special strings embedded in your command line. To quote the 4.2 BSD csh manpage:

History substitutions place words from previous command input as portions of new commands, making it easy to repeat commands, repeat arguments of a previous command in the current command, or fix spelling mistakes in the previous command with little typing and a high degree of confidence. History substitutions begin with the character `!' and may begin anywhere in the input stream (with the proviso that they do not nest).

The most well known history substitution for tcsh users is probably '!!', which repeats the previous command. Bash has a similar facility, cf, and even today the Bash manual calls out its similarity to csh's version. These days I suspect most people using Bash don't use Bash's history substitutions and just stick to readline stuff; it's generally more fluid and easy to deal with.

(This is an obvious observation, but at times it's easy to blur the old days of Unix together and lose track of how comparatively old some parts of it are. Or at least it is for me.)

PS: My impression is that the widespread availability of command and filename completion subtly shapes the kind of command names and file names that people use. When you don't have completion, it makes a lot of sense for names to be short and it doesn't matter if they're all jumbled together so that completion can't tell them apart. Famously, Unix loves short command names because they're short to type, which makes a lot of sense in a V7 environment.

TimeBeforeReadline written at 00:08:46; Add Comment

2020-08-09

Unix options conventions are just that, which makes them products of culture

Recently I wrote about my views on some conventions for Unix command line options, where I disagreed in part with what Chris Wellons considered Conventions for Command Line Options. Both Wellons and I have a lot of Unix experience, and that we disagreed on parts of what rightfully should be a well established core of Unix in practice shows some things about them.

The first thing to note about Unix's conventions on options is that they've always been ad hoc and imperfectly adhered to, to the extent that they even existed in the first place. To start with, V7 Unix did not have getopt(3), so every V7 program did its own parsing of options and I'm certain that some of them had somewhat different behavior. Several V7 programs completely broke these conventions; famously dd doesn't even use conventional options that start with a '-', and while find has options that start with '-' they're actually more like GNU style long options.

(Wikipedia implies that getopt(3) first appeared in System III, and indeed here's the System III getopt(3) manpage, dating from 1980 (cf, also).)

The second thing is that both Wellons and I can go on about conventions all we want (and what they should be), but the reality is that the 'conventions' that exist are defined by what programs actually do. If a lot of programs (or a popular option parsing library) behave in a particular way, in practice that is the convention regardless what I think of it (or write). The corollary of this is that what people consider convention is in large part defined by how the programs they use behave. By its mere existence and popularity, GNU Getopt has defined a lot of the modern conventions for options handling; if you deviate from it, you will surprise people who expect your programs to behave like the other programs they use every day. Before GNU Getopt was what most programs used, getopt(3) did the same thing and had the same effect for the conventions it enforced.

(New options parsing libraries tend not to break too much with the current convention when they were initially written, but they can selectively change some of them, especially what are considered more obscure ones.)

Finally, I suspect that part of the difference between Wellons' view of these conventions and mine is because of when we came into Unix. I started using Unix long enough ago that it was in the era of classic getopt(3) instead of GNU Getopt (and long options), so the rules that getopt(3) enforced were the ones that I wound up internalizing as the correct conventions. Someone who came into Unix later would have been primarily exposed to GNU Getopt's somewhat different behavior, with it supporting intermixed options and non option arguments, long options being routine, and so on.

The corollary of this is that people who come into Unix today are learning the conventions as they stand now, including any inconsistencies between Unix programs that are increasingly written in different languages, with different argument parsing libraries, and so on. Some languages are sufficiently divergent that no one is going to mistake them for 'how Unix commands should behave' (I'm looking at you, Go), but others are close enough that people are likely to internalize parts of their behavior, even if only as expected divergences and differences, just as people remember find and its unusual behavior.

UnixOptionsConventions written at 22:49:55; Add Comment

2020-08-05

My views on some conventions for Unix command line options

Recently I read Chris Wellons' Conventions for Command Line Options, which reviews these conventions. As it happens, I learned and have internalized a somewhat different version of how these conventions should be and how I expect programs to behave. I'm not going to mention things where my expectations agree with Wellons' presentation of the conventions, just where I differ.

On short options that accept an argument, Wellons says:

This technique is used to create another category, optional option arguments. The option’s argument can be optional [...]

There are no optional option arguments; an option either always takes an argument or it never does. This is how the traditional getopt(3) behaves, at least as far as I remember, and appears to be how the 4.3 BSD getopt(3) manpage documents it.

(It's also how POSIX getopt() is required to behave; see the discussion of how optarg is set.)

Options can typically appear in any order — something parsers often achieve via permutation — but non-options typically follow options.

Non-options always follow options. By extension, the first non-option argument terminates scanning for options; any remaining '-...' things become arguments. Again this is how the 4.3 BSD getopt(3) is documented, and in fact that options and non-options can't be intermixed is mostly required by the getopt(3) API.

(How getopt(3) returns non-option arguments to you is that it gives you the index of the first argument in argv. To support intermixed options and non-options, it would have to permute the order of entries in argv to move all options up to before all non-options. In modern C definitions of getopt(3), including the POSIX one, I believe this is forbidden because argv is declared const.)

My strong cultural expectations for option handling only cover short options; while I have opinions about how long options should act, they're not as visceral as for short options. Just as with short options and for much the same pragmatic reasons, I don't believe in long options with optional option arguments; long options should either always take an argument or never do so. Behaving otherwise breaks my expectation that long options are just the same as short options except longer (and they can't be grouped, and there's that optional '=' thing for arguments).

(This difference between my views and Chris Wellons' views points out some general issues here, but that's for another entry.)

MyOptionsConventions written at 23:34:11; Add Comment

(Previous 10 or go back to July 2020 at 2020/07/23)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.