An interesting picky difference between Bourne shells

July 25, 2014

Today we ran into an interesting bug in one of our internal shell scripts. The script had worked for years on our Solaris 10 machines, but on a new OmniOS fileserver it suddenly reported an error:

script[77]: [: 232G: arithmetic syntax error

Cognoscenti of ksh error messages have probably already recognized this one and can tell me the exact problem. To show it to everyone else, here is line 77:

if [ "$qsize" -eq "none" ]; then
   ....

In a strict POSIX shell, this is an error because test's -eq operator is specifically for comparing numbers, not strings. What we wanted is the = operator.

What makes this error more interesting is that the script had been running for some time on the OmniOS fileserver without this error. However, until now the $qsize variable had always had the value 'none'. So why hadn't it failed earlier? After all, 'none' (on either side of the expression) is just as much of not-a-number as '232G' is.

The answer is that this is a picky difference between shells in terms of how they actually behave. Bash, for example, always complains about such misuse of -eq; if either side is not a number you get an error saying 'integer expression expected' (as does Dash, with a slightly different error). But on our OmniOS, /bin/sh is actually ksh93 and ksh93 has a slightly different behavior. Here:

$ [ "none" -eq "none" ] && echo yes
yes
$ [ "bogus" -eq "none" ] && echo yes
yes
$ [ "none" -eq 0 ] && echo yes
yes
$ [ "none" -eq "232G" ] && echo yes
/bin/sh: [: 232G: arithmetic syntax error

The OmniOS version of ksh93 clearly has some sort of heuristic about number conversions such that strings with no numbers are silently interpreted as '0'. Only invalid numbers (as opposed to things that aren't numbers at all) produce the 'arithmetic syntax error' message. Bash and dash are both more straightforward about things (as is the FreeBSD /bin/sh, which is derived from ash).

Update: my description isn't actually what ksh93 is doing here; per opk's comment, it's actually interpreting the none and bogus as variable names and giving them a value of 0 when unset.

Interestingly, the old Solaris 10 /bin/sh seems to basically be calling atoi() on the arguments for -eq; the first three examples work the same, the fourth is silently false, and '[ 232 -eq 232G ]' is true. This matches the 'let's just do it' simple philosophy of the original Bourne shell and test program and may be authentic original V7 behavior.

(Technically this is a difference in test behavior, but test is a builtin in basically all Bourne shells these days. Sometimes the standalone test program in /bin or /usr/bin is actually a shell script to invoke the builtin.)

Written on 25 July 2014.
« The OmniOS version of SSH is kind of slow for bulk transfers
Save your test scripts and other test materials »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jul 25 23:34:18 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.