Wandering Thoughts archives

2006-12-13

An example of Unix's slow fossilization

If you walk up to the console of some Linux machines that have their capslock turned on and try to log in, an interesting thing happens:

keyx login: CKS
PASSWORD:
[...]
CKS@KEYX:~$

(This works with Ubuntu 6.06, but not with Fedora Core 6.)

Why does this happen?

Once, long ago, there were terminals that only did upper case, and there were people who wanted to connect them to Unix systems. So Bell Labs put a very special hack into getty: if it saw a login name that was all in upper case, it assumed that you were using such a terminal, lower-cased the name, and set a special terminal mode where lower case was converted to upper case on output and upper case converted to lower case on input.

(Of course this doesn't work very well if your password has any actual upper-case characters. Or your username, or the names of your files, or any command options, or etc etc. People who needed this hack were presumably going to avoid all of that.)

It has probably been at least twenty years since such a terminal was connected to a Unix system. In all that time, very few people removed this feature, and so it lurks around many systems to this day. Including systems reimplemented from scratch, where people can't even claim that it was less work to leave old code alone instead of removing it.

(To be fair, this seems to have been removed from the latest version of the Single Unix Specification. Also, FreeBSD and OpenBSD seem to not support 'stty lcase', although Linux does, which I find ironic.)

UnixFossilizationExample written at 22:03:53; Add Comment

2006-12-05

A small annoyance with Unix wildcards

Here's a small irritation with Unix wildcards: there's no generally recognized wildcard (or small set of wildcards) that matches all of the files in a directory, including dotfiles but excluding . and .., so that you could easily match all of a directory's real contents.

A plain * doesn't match dotfiles; * .* matches dotfiles, but includes . and .. too. About the best you can do is

* .??* .[^.]

However this blows up if there's no single-character dotfiles. This is a general defect in any multi-wildcard scheme, of course; the more wildcards and the more obscure they are, the more likely you are to have one that doesn't match anything.

I believe that some shells have a .* wildcard that doesn't include . and ..; on quick testing, zsh is one. (Bash and ksh seem almost able to do it but not quite, as far as I can tell.)

Interestingly, the behavior of * not matching files that start with . goes back at least as far as V5 Unix. (The V4 Unix sources have apparently been lost, and the V4 and V5 manpages are not precise enough to say whether the behavior existed in V4. If you are poking around this stuff at tuhs.org, it is helpful to know that in V6 and earlier shell globbing was done by an external program, /etc/glob; the V5 source is here.)

It's interesting to see that fairly complete shell wildcard support goes back very far in Unix history. Third Edition, released in February 1973 and the oldest Unix that tuhs.org has useful stuff for, already has *, ?, and [...] (including character ranges with -). (Since the V3 ls has a -a option to reveal dotfiles, I suspect that even V3 * wildcards didn't match dotfiles.)

Bonus trivia: the oldest ls options, present in V3, are -l, -t, -a, -s, and -d. V4 ls grew -r and -u, and after that you can go digging for yourself.

WildcardAnnoyance written at 23:49:09; Add Comment

2006-01-26

A Unix annoyance

Unix: where you have to escape command line arguments twice, once from the shell and once from the program.

I dearly love Unix, but I am forced to admit that every so often it has what I can only describe as 'robot logic'; things that are perfectly logical, but not to normal humans. For example, consider the fun of trying to remove a file called something like "-Brad Foo".

$ rm -Brad Foo
rm: invalid option -- B
$ rm "-Brad Foo"
rm: invalid option -- B
$ rm ./-Brad Foo
rm: cannot remove `./-Brad': No such file or directory
rm: cannot remove `Foo': No such file or directory

(and so on.)

Of course this is happening because the arguments are processed twice; once by the shell to determine what's a single argument and what gets split up, and then again by rm itself to find its options. So you have to escape the filename in the shell because it has a space in it, and that's entirely separate from how rm normally thinks anything starting with a dash is a switch, not a filename to be removed. (And programs can be wildly inconsistent in how to escape their arguments.)

I've used Unix sufficiently long and deeply that that I can see and explain all of this in my sleep. It all makes perfect sense and it's completely consistent and logical (once you understand Unix's logic).

But let's be honest here: it's robot logic, not human logic.

(This entry is inspired by the travails of a perfectly technically adept Unix person in a similar situation, as recounted here. He had a simpler example that didn't need being escaped from the shell, just from rm.)

UnixAnnoyance written at 02:10:24; Add Comment

2006-01-16

The danger of specific errno values

One of the things I've learned the hard way in writing Unix programs is that you should almost never expect to know all of the errno values that system calls can return. In theory it's documented in the manpages, but in practice a manpage's list is never comprehensive. The safest thing to do is to assume that any system call can return almost any errno value.

The example I remember most vividly is that our SMTP server daemon used to carefully check the result of accept(). If there was an error and it was anything besides EAGAIN, the daemon decided that something was undoubtedly broken and therefor it should die. Unfortunately, it turns out that on some systems accept() can also return ECONNRESET, because people can initiate a TCP connection and then terminate it before the SMTP daemon gets a chance to accept() the new connection.

Until we caught this and fixed it, the result was that every so often (usually under periods of high load), the daemon would die mysteriously. Whoops.

This also points out another danger with checking specific errno values: they can change over time. So even code that was perfect and correct when it was written can drift into having problems over time and new systems, unless you recheck and revise it all the time.

(Our SMTP server daemon's code was probably perfectly correct for BSD systems of the early 1990s, for example.)

SpecificErrnoDanger written at 00:03:45; Add Comment

By month for 2006: Jan Dec; before 2006; after 2006.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.