Wandering Thoughts archives

2014-10-06

Why it's sensible for large writes to pipes to block

Back in this entry I said that large writes to pipes blocking instead of immediately returning with a short write was a sensible API decision. Today let's talk about that, by way of talking about how deciding the other way would be a bad API.

Let's start with a question: in a typical Unix pipeline program like grep, what would be the sensible reactions to trying to write a large amount of data returning a short write indicator? This is clearly not an error that should cause the program to abort (or even to print a warning); instead it's a perfectly normal thing if you're producing output faster than the other side of the pipe can consume it. For most programs, that means the only thing you can really do is pause until you can write more to the pipe. The conclusion is pretty straightforward; in a hypothetical world where such too-large pipe writes returned short write indicators instead of blocking, almost all programs would either wrap their writes in code that paused and retried them or arrange to set a special flag on the file descriptor to say 'block me until everything is written'. Either or both would probably wind up being part of stdio.

If everything is going to have code to work around or deal with something, this suggests that you are picking the wrong default. Thus large writes to pipes blocking by default is the right API decision because it means everyone can write simpler and less error-prone code at the user level.

(There are a number of reasons this is less error-prone, including both programs that don't usually expect to write to pipes (but you tell them to write to /dev/stdout) and programs that usually do short writes that don't block and so don't handle short writes, resulting in silently not writing some amount of their output some of the time.)

There's actually a reason why this is not merely a sensible API but a good one, but that's going to require an additional entry rather than wedging it in here.

Sidebar: This story does not represent actual history

The description I've written above more or less requires that there is some way to wait for a file descriptor to become ready for IO, so that when your write is short you can find out when you can usefully write more. However there was no such mechanism in early Unixes; select() only appeared in UCB BSD (and poll() and friends are even later). This means that having nonblocking pipe writes in V7 Unix would have required an entire set of mechanisms that only appeared later, instead of just a 'little' behavior change.

(However I do suspect that the Bell Labs Unix people actively felt that pipe writes should block just like file writes blocked until complete, barring some error. Had they felt otherwise, the Unix API would likely have been set up somewhat differently and V7 might have had some equivalent of select().)

If you're wondering how V7 could possibly not have something like select(), note that V7 didn't have any networking (partly because networks were extremely new and experimental at the time). Without networking and the problems it brings, there's much less need (or use) for a select().

BlockingLargePipeWrites written at 01:03:58; Add Comment

2014-09-19

What I mean by passive versus active init systems

I have in the past talked about passive versus active init systems without quite defining what I meant by that, except sort of through context. Since this is a significant division between init systems that dictates a lot of other things, I've decided to fix that today.

Put simply, an active init system is one that actively tracks the status of services as part of its intrinsic features; a passive init system is one that does not. The minimum behavior of an active init system is that it knows what services have been activated and not later deactivated. Better active init systems know whether services are theoretically still active or if they've failed on their own.

(Systemd, upstart, and Solaris's SMF are all active init systems. In general any 'event-based' init system that starts services in response to events will need to be active, because it needs to know which services have already been started and which ones haven't and thus are candidates for starting now. System V init's /etc/init.d scripts are a passive init system, although /etc/inittab is an active one. Most modern daemon supervision systems are active systems.)

One direct consequence is that an active init system essentially has to do all service starting and stopping itself, because this is what lets it maintain an accurate record of what services are active. You may run commands to do this, but they have to talk to the init system itself. By contrast, in a passive init system the commands you run to start and stop services can be and often are just shell scripts; this is the archetype of System V init.d scripts. You can even legitimately start and stop services outside of the scripts at all, although things may get a bit confused.

(In the *BSDs things can be even simpler in that you don't have scripts and you may just run the daemons. I know that OpenBSD tends to work this way but I'm not sure if FreeBSD restarts stuff quite that directly.)

An active init system is also usually more communicative with the outside world. Since it knows the state of services it's common for the init system to have a way to report this status to people who ask, and of course it has to have some way of being told either to start and stop services or at least that particular services have started and stopped. Passive init systems are much less talkative; System V init basically has 'change runlevel' and 'reread /etc/inittab' and that's about it as far its communication goes (and it doesn't even directly tell you what the runlevel is; that's written to a file that you read).

Once you start down the road to an active init system, in practice you wind up wanting some way to track daemon processes so you can know if a service has died. Without this an active init system is basically flying blind in that it knows what theoretically started okay but it doesn't necessarily know what's still running. This can be done by requiring cooperative processes that don't do things like detach themselves from their parents or it can be done with various system specific Unix extensions to track groups of processes even if they try to wander off on their own.

As we can see from this, active init systems are more complicated than passive ones. Generally the more useful features they offer and the more general they are the more complicated they will be. A passive init system can be done with shell scripts; an attractive active one requires some reasonably sophisticated C programming.

PS: An active init system that notices when services die can offer a feature where it will restart them for you. In practice most active init systems aren't set up to do this for most services for various reasons (that may or may not be good ones).

(This entry was partly sparked by reading parts of this mail thread that showed up in my Referer logs because it linked to some of my other entries.)

PassiveVsActiveInitSystems written at 00:57:11; Add Comment

2014-09-10

Does init actually need to do daemon supervision?

Sure, init has historically done some sort of daemon supervision (or at least starting and stopping them) and I listed it as one of init's jobs. But does it actually need to do this? This is really two questions and thus two answers.

Init itself, PID 1, clearly does not have to be the process that does daemon supervision. We have a clear proof of this in Solaris, where SMF moves daemon supervision to a separate set of processes. SMF is not a good init system but its failures are failures of execution, not of its fundamental design; it does work, it's just annoying.

Whether the init system as a whole needs to do daemon supervision is a much more philosophical question and thus harder to answer. However I believe that on the whole the init system is the right place for this. The pragmatics of why are simple: the init system is responsible for booting and shutting down the system and doing this almost always needs at least some daemons to be started or stopped in addition to more scripted steps like filesystem checks. This means that part of daemon supervision is at least quite tightly entwined with booting, what I called infrastructure daemons when I talked about init's jobs. And since your init system must handle infrastructure daemons it might as well handle all daemons.

(In theory you could define an API for communication between the init system and a separate daemon supervision system in order to handle this. In practice, until this API is generally adopted your init system is tightly coupled with whatever starts and stops infrastructure daemons for it, ie you won't be able to swap one infrastructure daemon supervision system for another and whichever one your init system needs might as well be considered part of the init system itself.)

I feel that the pragmatic argument is also the core of a more philosophical one. There is no clear break between infrastructure daemons and service daemons (and in fact what category a daemon falls into can vary from system to system), which makes it artificial to have two separate daemon supervision systems. If you want to split the job of an init system apart at all, the 'right' split is between the minimal job of PID 1 and the twin jobs of booting the system and supervising daemons.

(This whole thing was inspired by an earlier entry being linked to by this slashdot comment, and then a reply to said comment arguing that the role of init is separate from a daemon manager. As you can see, I don't believe that it is on Unix in practice.)

Sidebar: PID 1 and booting the system

This deserves its own entry to follow all of the threads, but the simple version for now: in a Unix system with (only) standard APIs, the only way to guarantee that a process winds up as PID 1 is for the kernel to start it as such. The easiest way to arrange for this is for said process to be the first process started so that PID 1 is the first unused PID. This naturally leads into PID 1 being responsible for booting the system, because if it wasn't the kernel would have to also start another process to do this (and there would have to be a decision about what the process is called and so on).

This story is increasingly false in modern Unix environments which do various amounts of magic setup before starting the final real init, but there you have it.

InitDaemonSupervision written at 01:58:15; Add Comment

2014-09-08

What an init system needs to do in the abstract

I've talked before about what init does historically, but that's not the same thing as what an init system actually needs to do, considered abstractly and divorced from the historical paths that got us here and still influence how we think about init systems. So, what does a modern init system in a modern Unix need to do?

At the abstract level, I think a modern init system has three jobs:

  1. Being the central process on the system. This is both the modest job of being PID 1 (inheriting parentless processes and reaping them when they die) and the larger, more important job of supervising and (re)starting any other components of the init system.

  2. Starting and stopping the system, and also transitioning it between system states like single user and multiuser. The second job has diminished in importance over the years; in practice most systems today almost never transition between runlevels or the equivalent except to boot or reboot.

    (At one point people tried to draw a runlevel distinction between 'multiuser without networking' and 'multiuser with networking' and maybe 'console text logins' and 'graphical logins with X running' but today those distinctions are mostly created by stopping and starting daemons, perhaps abstracted through high level labels for collections of daemons.)

  3. Supervising (daemon) processes to start, stop, and restart them on demand or need or whatever. This was once a sideline but has become the major practical activity of an init system and why people spend most of the time interacting with it. Today this encompasses both regular getty processes (which die and restart regularly) and a whole collection of daemons (which are often not expected to die and may not be restarted automatically if they do).

    You can split this job into two sorts of daemons, infrastructure processes that must be started in order for the core system to operate (and for other daemons to run sensibly) and service processes that ultimately just provide services to people using the machine. Service processes are often simpler to start, restart, and manage than infrastructure processes.

In practice modern Unixes often add a fourth job, that of managing the appearance and disappearance of devices. This job is not strictly part of init but it is inextricably intertwined with at least booting the system (and sometimes shutting it down) and in a dependency-based init system it will often strongly influence what jobs/processes can be started or must be stopped at any given time (eg you start network configuration when the network device appears, you start filesystem mounts when devices appear, and so on).

The first job mostly or entirely requires being PID 1; at a minimum your PID 1 has to inherit and reap orphans. Since stopping and starting daemons and processes in general is a large part of booting and rebooting, the second and third jobs are closely intertwined in practice although you could in theory split them apart and that might simplify each side. The fourth job is historically managed by separate tools but often talks with the init system as a whole because it's a core dependency of the second and third jobs.

(Booting and rebooting is often two conceptually separate steps in that first you check filesystems and do other initial system setup then you start a whole bunch of daemons (and in shutdown you stop a bunch of daemons and then tear down core OS bits). If you do this split, you might want to transfer responsibility for infrastructure daemons to the second job.)

The Unix world has multiple existence proofs that all of these roles do not have to be embedded in a single PID 1 process and program. In particular there is a long history of (better) daemon supervision tools that people can and do use as replacements for their native init system's tools for this (often just for service daemons), and as I've mentioned Solaris's SMF splits the second and third role out into a cascade of additional programs.

InitTheoreticalJobs written at 21:54:24; Add Comment

2014-09-05

Some uses for SIGSTOP and some cautions

If you ask, many people will tell you that Unix doesn't have a general mechanism for suspending processes and later resuming them. These people are correct in general, but sometimes you can cheat and get away with a good enough substitute. That substitute is SIGSTOP, which is at the core of job control. Although processes can catch and react to other job control signals, SIGSTOP is a non-blockable signal like SIGKILL (aka 'kill -9'). When a process is sent it, the kernel stops the process on the spot and suspends it until the process gets a SIGCONT (more or less). You can thus pause processes and continue them by manually sending them SIGSTOP and SIGCONT as appropriate and desired.

(Since it's a regular signal, you can use a number of standard mechanisms to send SIGSTOP to an entire process group or all of a user's processes at once.)

There are any number of uses for this. Do you have too many processes banging away on the disk (or just think you might)? You can stop some of them for a while. Is a process saturating your limited network bandwidth? Pause it while you get a word in edgewise. And so on. Basically this is more or less job control for relatively arbitrary user processes, as you might expect.

Unfortunately there are some cautions and limitations attached to use of SIGSTOP on arbitrary processes. The first one is straightforward: if you SIGSTOP something that is talking to the network or to other processes, its connections may break if you leave it stopped too long. The other processes don't magically know that the first process has been suspended and so they should let it be, and many of them will have limits on how much data they'll queue up or how long they'll wait for responses and the like. Hit the limits and they'll assume something has gone wrong and cut your suspended process off.

(The good news is that it will be application processes that do this, and only if they go out of their way to have timeouts and other limits. The kernel is perfectly happy to leave things be for however long you want to wait before a SIGCONT.)

The other issue is that some processes will detect and react to one of their children being hit with a SIGSTOP. They may SIGCONT the child or they may kill the process outright; in either case it's probably not what you wanted to happen. Generally you're safest when the parent of the process you want to pause is something simple, like a shell script. In particular, init (PID 1) is historically somewhat touchy about SIGSTOP'd processes and may often either SIGCONT them or kill them rather than leave them be. This is especially likely if init inherits a SIGSTOP'd process because its original parent process died.

(This is actually relatively sensible behavior to avoid init having a slowly growing flock of orphaned SIGSTOP'd processes hanging around.)

These issues, especially the second, are why I say that SIGSTOP is not a general mechanism for suspending processes. It's a mechanism and on one level it always works, but the problem is the potential side effects and aftereffects. You can't just SIGSTOP an arbitrary process and be confident that it will still be there to be continued ten minutes later (much less over longer time intervals). Sometimes or often you'll get away with it but every so often you won't.

SIGSTOPUsesAndCautions written at 01:01:50; Add Comment

2014-08-27

The difference between Linux and FreeBSD boosters for me

A commentator on my entry about my cultural bad blood with FreeBSD said, in very small part:

I'm surprised that you didn't catch this same type of bad blood from the linux world. [...]

This is a good question and unfortunately my answers involve a certain amount of hand waving.

First off, I think I simply saw much less of the Linux elitism than I did of the FreeBSD elitism, partly because I wasn't looking in the places where it probably mostly occurred and partly because by the time I was looking at all, Linux was basically winning and so Linux people did less of it. To put it one way, I'm much more inclined towards the kind of places you found *BSD people in the early 00s than the kinds of places that were overrun by bright-eyed Linux idiots.

(I don't in general hold the enthusiasms of bright-eyed idiots of any stripe against a project. Bright-eyed idiots without enough experience to know better are everywhere and are perfectly capable of latching on to anything that catches their fancy.)

But I think a large part of it was that the Linux elitism I saw was both of a different sort than the *BSD elitism and also in large part so clearly uninformed and idiotic that it was hard to take seriously. To put it bluntly, the difference I can remember seeing between the informed people on both sides was that the Linux boosters mostly just argued that it was better while the *BSD boosters seemed to have a need to go further and slam Linux, Linux users, and Linux developers while wrapping themselves in the mantle of UCB BSD and Bell Labs Unix.

(Linux in part avoided this because it had no historical mantle to wrap itself in. Linux was so clearly ahistorical and hacked together (in a good sense) that no one could plausibly claim some magic to it deeper than 'we attracted the better developers'.)

I find arguments and boosterism about claimed technical superiority to be far less annoying and offputting than actively putting down other projects. While I'm sure there were Linux idiots putting down the *BSDs (because I can't imagine that there weren't), they were at most a a fringe element of what I saw as the overall Linux culture. This was not true of the FreeBSD and the *BSDs, where the extremely jaundiced views seemed to be part of the cultural mainline.

Or in short: I don't remember seeing as much Linux elitism as I saw *BSD elitism and the Linux boosterism I saw irritated me less in practice for various reasons.

(It's certainly possible that I was biased in my reactions to elitism on both sides. By the time I was even noticing the Linux versus *BSD feud we had already started to use Linux. I think before then I was basically ignoring the whole area of PC Unixes, feuds and all.)

LinuxElitismReactions written at 02:25:27; Add Comment

2014-08-15

A consequence of NFS locking and unlocking not necessarily being fast

A while back I wrote Cross-system NFS locking and unlocking is not necessarily fast, about one drawback of using NFS locks to communicate between processes on different machines. This drawback is that it may take some time for process B on machine 2 to find out that process A on machine 1 has unlocked the shared coordination file. It turns out that this goes somewhat further than I realized at the time. Back then I looked at cross-system lock activity, but it turns out that you can see long NFS lock release delays even when the two processes are on the same machine.

If you have process A and process B on the same machine, both contending for access to the same file via file locking, you can easily see significant delays between active process A releasing the lock and waiting process B being notified that it now has the lock. I don't know enough about the NLM protocol to know if the client or server kernels can do anything to make the process go faster, but there are some client/server combinations where this delay does happen.

(If the client's kernel is responsible for periodically retrying pending locking operations until they succeed, it certainly could be smart enough to notice that another process on the machine just released a lock on the file and so now might be a good time for a another retry.)

This lock acquisition delay can have a pernicious spiraling effect on an overall system. Suppose, not entirely hypothetically, that what a bunch of processes on the same machine are locking is a shared log file. Normally a process spends very little time doing logging and most of their time doing other work. When they go to lock the log file to write a message, there's no contention, they get the lock, they basically immediately release the lock, and everyone goes on fine. But then you hit a lock collision, where processes A and B both want to write. A wins, writes its log message, and unlocks immediately. But the NFS unlock delay means that process B is then going to sit there for ten, twenty, or thirty seconds before it can do its quick write and release the lock in turn. Suppose during this time another process, C, also shows up to write a log message. Now C may be waiting too, and it too will have a big delay to acquire the lock (if locks are 'fair', eg FIFO, then it will have to wait both for B to get the lock and for the unlock delay after B is done). Pretty soon you have more and more processes piling up waiting to write to the log and things grind to a much slower pace.

I don't think that there's a really good solution to this for NFS, especially since an increasing number of Unixes are making all file locks be NFS aware (as we've found out the hard way before). It's hard to blame the Unixes that are doing this, and anyways the morally correct solution would be to make NLM unlocks wake up waiting people faster.

PS: this doesn't happen all of the time. Things are apparently somewhat variable based on the exact server and client versions involved and perhaps timing issues and so on. NFS makes life fun.

Sidebar: why the NFS server is involved here

Unless the client kernel wants to quietly transfer ownership of the lock being unlocked to another process instead of actually releasing it, the NFS server is the final authority on who has a NFS lock and it must be consulted about things. For all that any particular client machine knows, there is another process on another machine that very occasionally wakes up, grabs a lock on the log file, and does stuff.

Quietly transferring lock ownership is sleazy because it bypasses any other process trying to get the lock on another machine. One machine with a rotating set of processes could unfairly monopolize the lock if it did that.

NFSLockingSlowConsequence written at 01:47:25; Add Comment

2014-07-30

My view on FreeBSD versus Linux, primarily on the desktop

Today I wound up saying on Twitter:

@thatcks: @devbeard I have a pile of reasons I'm not enthused about FreeBSD, especially as a desktop with X and so on. So that's not really an option.

I got asked about this on Twitter and since my views do not in any way fit into 140 characters, it's time for an entry.

I can split my views up into three broad categories: pragmatic, technical, and broadly cultural and social. The pragmatic reasons are the simplest ones and boil down to that Linux is the dominant open source Unix. People develop software for Linux first and everything else second, if at all. This is extremely visible for an X desktop (the X server and all modern desktops are developed and available first on Linux) but extends far beyond that; Go, for example, was first available on Linux and later ported to FreeBSD. Frankly I like having a wide selection of software that works without hassles and often comes pre-packaged, and generally not having to worry if something will run on my OS if it runs on Unix at all. FreeBSD may be more pure and minimal here but as I've put it before, I'm not a Unix purist. In short, running FreeBSD in general usage generally means taking on a certain amount of extra pain and doing without a certain amount of things.

On the technical side I feel that Linux and Linux distributions have made genuinely better choices in many areas, although I'm somewhat hampered by a lack of deep exposure to FreeBSD. For example, I would argue that modern .deb and RPM Linux package management is almost certainly significantly more advanced than FreeBSD ports. As another one, I happen to think that systemd is the best Unix init system currently available with a lot of things it really gets right, although it is not perfect. There are also a horde of packaging decisions like /etc/cron.d that matter to system administrators because they make our lives easier.

(And yes, FreeBSD has sometimes made better technical choices than Linux. I just think that there have been fewer of them.)

On the social and cultural side, well, I cannot put it nicely so I will put it bluntly: I have wound up feeling that FreeBSD is part of the conservative Unix axis that worships at the altar of UCB BSD, System V, and V7. This is not required by its niche as the non-Linux Unix but that situation certainly doesn't hurt; a non-Linux Unix is naturally attractive to people who don't like Linux's reinvention, ahistoricality, and brash cultural attitudes. I am not fond of this conservatism because I strongly believe that Unix needs to grow and change and that this necessarily requires experimentation, a willingness to have failed experiments, and above all a willingness to change.

This is a somewhat complex thing because I don't object to a Unix being slow moving. There is certainly a useful ecological niche for a cautious Unix that lets other people play pioneer and then adopts the ideas that have proven to be good ones (and Linux's willingness to adopt new things creates churn; just ask all of the people who ported their init scripts to Upstart and will now be re-porting them to systemd). If I was confident that FreeBSD was just waiting to adopt the good bits, that would be one thing. But as an outsider I haven't been left with that feeling; instead my brushing contacts have left me with more the view that FreeBSD has an aspect of dogmatic, 'this is how UCB BSD does it' conservatism to it. Part of this is based on FreeBSD still not adopting good ideas that are by now solidly proven (such as, well, /etc/cron.d as one small example).

This is also the area where my cultural bad blood with FreeBSD comes into play. Among other more direct things, I'm probably somewhat biased towards seeing FreeBSD as more conservative than it actually is and I likely don't give FreeBSD the benefit of the doubt when it does something (or doesn't do something) that I think of as hidebound.

None of this makes FreeBSD a bad Unix. Let me say it plainly: FreeBSD is a perfectly acceptable Unix in general. It is just not a Unix that I feel any particular enthusiasm for and thus not something I'm inclined to use without a compelling reason. My default Unix today is Linux.

(It would take a compelling reason to move me to FreeBSD instead of merely somewhere where FreeBSD is a bit better because of the costs of inevitable differences.)

FreeBSDvsLinux written at 01:31:40; Add Comment

2014-07-28

FreeBSD, cultural bad blood, and me

I set out to write a reasoned, rational elaboration of a tweet of mine, but in the course of writing it I've realized that I have some of those sticky human emotions involved too, much like my situation with Python 3. What it amounts to is that in addition to my rational reasons I have some cultural bad blood with FreeBSD.

It certainly used to be the case that a vocal segment of *BSD people, FreeBSD people among them, were elitists who looked down their noses at Linux (and sometimes other Unixes too, although that was usually quieter). They would say that Linux was not a Unix. They would say that Linux was clearly used by people who didn't know any better or who didn't have any taste. There was a fashion for denigrating Linux developers (especially kernel developers) as incompetents who didn't know anything. And so on; if you were around at the right time you probably can think of other things. In general these people seemed to venerated the holy way of UCB BSD and find little or no fault in it. Often these people believed (and propagated) other Unix mythology as well.

(This sense of offended superiority is in no way unique to *BSD people, of course. Variants have happened all through computing's history, generally from the losing side of whatever shift is going on at the time. The *BSD attitude of 'I can't believe so many people use this stupid Linux crud' echoes the Lisp reaction to Unix and the Unix reaction to Windows and Macs, at least (and the reaction of some fans of various commercial Unixes to today's free Unixes).)

This whole attitude irritated me for various reasons and made me roll my eyes extensively; to put it one way, it struck me as more religious than well informed and balanced. To this day I cannot completely detach my reaction to FreeBSD from my association of it with a league of virtual greybeards who are overly and often ignorantly attached to romantic visions of the perfection of UCB BSD et al. FreeBSD is a perfectly fine operating system and there is nothing wrong with it, but I wish it had kept different company in the late 1990s and early to mid 00s. Even today there is a part of me that doesn't want to use 'their' operating system because some of the company I'd be keeping would irritate me.

(FreeBSD keeping different company was probably impossible, though, because of where the Unix community went.)

(I date this *BSD elitist attitude only through the mid 00s because of my perception that it's mostly died down since then. Hopefully this is an accurate perception and not due to selective 'news' sources.)

FreeBSDCulturalBadBlood written at 23:32:40; Add Comment

2014-07-25

An interesting picky difference between Bourne shells

Today we ran into an interesting bug in one of our internal shell scripts. The script had worked for years on our Solaris 10 machines, but on a new OmniOS fileserver it suddenly reported an error:

script[77]: [: 232G: arithmetic syntax error

Cognoscenti of ksh error messages have probably already recognized this one and can tell me the exact problem. To show it to everyone else, here is line 77:

if [ "$qsize" -eq "none" ]; then
   ....

In a strict POSIX shell, this is an error because test's -eq operator is specifically for comparing numbers, not strings. What we wanted is the = operator.

What makes this error more interesting is that the script had been running for some time on the OmniOS fileserver without this error. However, until now the $qsize variable had always had the value 'none'. So why hadn't it failed earlier? After all, 'none' (on either side of the expression) is just as much of not-a-number as '232G' is.

The answer is that this is a picky difference between shells in terms of how they actually behave. Bash, for example, always complains about such misuse of -eq; if either side is not a number you get an error saying 'integer expression expected' (as does Dash, with a slightly different error). But on our OmniOS, /bin/sh is actually ksh93 and ksh93 has a slightly different behavior. Here:

$ [ "none" -eq "none" ] && echo yes
yes
$ [ "bogus" -eq "none" ] && echo yes
yes
$ [ "none" -eq 0 ] && echo yes
yes
$ [ "none" -eq "232G" ] && echo yes
/bin/sh: [: 232G: arithmetic syntax error

The OmniOS version of ksh93 clearly has some sort of heuristic about number conversions such that strings with no numbers are silently interpreted as '0'. Only invalid numbers (as opposed to things that aren't numbers at all) produce the 'arithmetic syntax error' message. Bash and dash are both more straightforward about things (as is the FreeBSD /bin/sh, which is derived from ash).

Update: my description isn't actually what ksh93 is doing here; per opk's comment, it's actually interpreting the none and bogus as variable names and giving them a value of 0 when unset.

Interestingly, the old Solaris 10 /bin/sh seems to basically be calling atoi() on the arguments for -eq; the first three examples work the same, the fourth is silently false, and '[ 232 -eq 232G ]' is true. This matches the 'let's just do it' simple philosophy of the original Bourne shell and test program and may be authentic original V7 behavior.

(Technically this is a difference in test behavior, but test is a builtin in basically all Bourne shells these days. Sometimes the standalone test program in /bin or /usr/bin is actually a shell script to invoke the builtin.)

ShTestDifference written at 23:34:18; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.