Wandering Thoughts archives

2013-11-20

test is surprisingly smart

Via Hacker News I would up reading Common shell script mistakes. When I read this, I initially thought that it contained well-intentioned but mistaken advice about test (aka '[ ... ]'). Then I actually checked what test's behavior is and got a bunch of surprised. It turns out that test is really quite smart, sometimes disturbingly so.

Here's two different versions of a test expression:

[ x"$var" = x"find" ] && echo yes
[ "$var" = "find" ] && echo yes

In theory, the reason the first version has an 'x' in front of both sides is to deal with the case where someone sets $var to something that is a valid test operator, like '-a' or '-x' or even '('; after all, '[ -a = find ]' doesn't look like a valid test expression. But if you actually check, it turns out that the second version works perfectly well too.

What's going on is that test is much smarter than you might think. Rather than simply processing its arguments left to right, it uses a much more complicated process of actually parsing its command line. When I started writing this entry I thought it was just modern versions that behaved this way, but in fact the behavior is much older than that; it goes all the way back to the V7 version of test, which actually implements a little recursive descent parser (in quite readable code). This behavior is even specified in the Single Unix Specification page for test where you can read the gory details for yourself (well, most of them).

(The exception is that the SuS version of test doesn't include -a for and or -o for or. This is an interesting exclusion since it turns out they were actually in the V7 version of test per eg the manpage.)

Note that this cleverness can break down in extreme situations. For example, '[ "$var1" -a "$var2" -a "$var3" ]' is potentially dangerous; consider what happens if $var2 is '-r'. And of course you still really want to use "..." to force things to be explicit empty arguments, because an outright missing argument can easily completely change the meaning of a test expression. Consider what happens to '[ -r $var ]' if $var is empty.

(It reduces to '[ -r ]', which is true because -r is not the empty string. You probably intended it to be false because a zero-length file name is considered unreadable.)

TestIsQuiteSmart written at 23:05:59; Add Comment

The difference between no argument and an empty argument

Here is a little Bourne shell quiz. Supposing that $VAR is not defined, are the following two lines equivalent?

./acnt $VAR
./acnt "$VAR"

The answer is no. If the acnt script is basically 'echo "$#"', then the first one will print 0 and the second one will print 1; in other word, the first line called acnt with no argument and the second one called acnt with one argument (that happens to be an empty string).

Unix shells almost universally draw some sort of a distinction between a variable expansion that results in no argument and an empty argument (although they can vary in how you force an empty argument). This is what we're seeing here; in the Bourne shell, using a "..." forces there to always be a single argument regardless of what $VAR expands to or doesn't. Sometimes this is useful behavior, for example when it means that a program is invoked with exactly a specific number of arguments (and with certain things in certain argument positions) even if some things aren't there. Sometimes this is inconvenient, if what you really wanted was to quote $VAR but not necessarily pass acnt an empty argument if $VAR wound up unset. If you want this latter behavior, you need to use the more awkward form:

./acnt ${VAR:+"$VAR"}

(Support for this is required by the Single Unix Specification and is present in Solaris 10, so I think you're very likely to find it everywhere.)

Note that it can be possible to indirectly notice the presence of empty arguments in situations where they don't show directly. For example:

$ echo a "$VAR" b
a  b

If you look carefully there is an extra space printed between a and b here; that is because echo is actually printing 'a', separator space, an empty string, another separator space, and then 'b'. Of course some programs are more obvious, even if the error message is a bit more mysterious:

$ cat "$VAR"
cat: : No such file or directory

(This entry is brought to you in the process of me discovering something interesting about modern versions of test, but that's another entry.)

EmptyArgumentVsNone written at 00:01:22; Add Comment

2013-11-16

Unix getopt versus Google's getopt variant and why Unix getopt wins

The longer title of this entry is 'why Google's version of getopt is right for Google but wrong for (almost) everyone else'.

One reaction to my rant about Go's getopt problem is to ask what the big problem is. Looked at in the right light, the rules implemented by Go's flag package (which are apparently the Google more or less standard for parsing flags) actually have a certain amount to recommend them because they're less ambiguous, more consistent, and less likely to surprise you. For example, consider this classic error: start with a command that takes a -t flag switch and a -f flag with an argument. One day someone in a hurry accidently writes this as 'cmd -ft fred bob' (instead of '-tf'). If you're lucky this will fail immediately with an error; if you're unlucky this quietly succeeds but does something quite different than what you expected. Google flag parsing reduces the chances of this by forcing you to always separate flags (so you can't do this just by transposing two characters).

In an environment where you are mostly or entirely using commands that parse flags this way, you get a number of benefits like this. I assume that this describes Google, which I suspect is heavily into internal tooling. But most environments are not like this at all; instead, commands using Go's flag package (or equivalents) are going to be the minority and the vast majority of commands you use will instead be either standard Unix commands or programs that use the same general argument parsing (partly because it's the default in almost everyone's standard library). In such environments the benefits that might come from Google flag parsing are dwarfed by the fact that it is just plain different from almost everything else you use. You will spend more time cursing because 'cmd -tf fred bob' gives you 'no such flag -tf' errors than you will ever likely save in the one (theoretical) time you type 'cmd -ft fred bob'.

(In theory you could also sort of solve the problem by always separating flag arguments even for programs that don't need this. But this is unlikely in practice since such programs are probably the majority of what you run and anyways, other bits of Go flag handling aren't at all compatible with standard Unix practice.)

In other words: inside Google, the de facto standard is whatever Google's tools do because you're probably mostly using them. Outside Google, the de facto standard is what the majority of programs do and that is standard Unix getopt (and extended GNU getopt). Deviations from de facto Unix getopt behavior cause the same problems that deviations from other de facto standards cause.

Now I'll admit that this is stating the case a bit too strongly. There are plenty of Unix programs that already deviate to a greater or lesser degree from standard Unix getopt. As with all interface standards a large part of what matters is how often people are going to use the deviant commands; the more frequently, the more you can get away with odd command behavior.

(You can also get away with more odd behavior if it's basically impossible to use your program casually. If an occasional user is going to have to re-read your documentation every time, well, you can (re)explain your odd command line behavior there.)

UnixVsGoogleGetopt written at 00:30:31; Add Comment

2013-11-06

Modern versions of Unix are more adjustable than they used to be

One of the slow changes in modern Unix over the past ten to fifteen years has been a significant increase in modularity and with it how adjustable a number of core things are without major work. This has generally not been something that ordinary users notice because it happens at the level of system-wide configuration.

Undoubtedly this all sounds abstract, so let's get concrete. The first example here is the relative pervasiveness of PAM. In the pre-PAM world, implementing additional password strength checks or special custom rules for who could su to who took non-trivial modifications to the source for passwd and su (or sudo). In the modern world both are simple PAM modules, as is things like taking special custom actions when a password is changed.

My next example is nsswitch.conf. There was a day in the history of Unix when adding DNS lookups to programs required recompiling them against a library with a special version of gethostbyname() et al. These days, how any number of things get looked up is not merely something that you can configure but something you can control; if you want or need to, you can add a new sort of lookup yourself as an aftermarket do it yourself thing. This can be exploited for clever hacks that don't require changing the system's programs in any particular way, just exploiting how they work (although there are limits imposed by this approach).

(Actually now that I'm writing this entry I'm not sure that there have been any major moves in this sort of core modularity beyond NSS and PAM. Although there certainly are more options for things like your cron daemon and your syslog daemon if you feel like doing wholesale replacement of programs.)

One of the things that these changes do is they reduce the need for operating system source since they reduce your need for custom versions of operating system commands.

(Of course you can still wind up needing OS source in order to figure out how to write your PAM or NSS module.)

Sidebar: best practices have improved too

One of the practical increases in modularity has come from an increasing number of programs (such as many versions of cron) scanning directories instead of just reading a file. As we learned starting no later than BSD init versus System V init, a bunch of files in a directory is often easier to manage than a monolithic single file because you can have all sorts of people dropping files in and updating their own files without colliding with each other. Things like Linux package management have strongly encouraged this approach.

UnixMoreAdjustable written at 01:30:03; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.