A dive into the depths of yes `yes no`

March 15, 2013

The blog entry of the current time interval is m. tang's yes `yes no`, which winds up exploring just that; as the author says, doing it 'sort of slowed my computer below the threshold of usefulness, so I had to restart it'. Unfortunately the author's original explanation that the second, outer yes buffers the output of yes no endlessly (eating up all memory in the process) is totally wrong (as various people have noted and corrected since the entry started going around). As it happens I think that there are some interesting things hiding under the covers here, so I'm going to talk about them.

First off, let's understand why this command line probably explodes your system. To be clear I'll rewrite this in a more modern but less aesthetically looking shell syntax and call it 'yes $(yes no)'. This is more or less equivalent to:

shvar=$(yes no)
yes $shvar

Just the first line alone is enough to blow up your shell because it asks the shell to read an endless amount of input and try to hold it in memory (here in the form of a shell variable). The same thing happens in the original command line, just without the intermediate shell variable.

You might wonder why the shell doesn't have some limit on how much input it's willing to read this way. While this is a self-inflicted accident, it's not as if Unix machines really deal well with running out of memory; on a 64-bit machine you could easily blow up the entire system doing this (on a 32-bit machine you might run into per-process address space limits before then). Saving you from this would be at least somewhat nice. I suspect that the real answer basically boils down to 'tradition'; this is such a rare (and self-inflicted) situation that no shell has bothered to deal with it yet and since no shell has, not dealing with it has become the default.

(Unix has a great deal of this sort of 'someone else did it this way first' historical practice that has basically fossilized over the years. Even if it doesn't necessarily entirely make sense it's often easier for people who are reimplementing commands to just go with existing (lack of) practice. Part of this ties into the social problems involved with changing things in Unix.)

Beyond that, though, there are some issues with having a limit. First you have to decide on the semantics of what happens when the limit is hit. Do you discard the output entirely or truncate it? Do you count this as a failure for the purposes of set -e or do you pass on the exit status of the 'yes no', whatever that is (and it may not be a failure)? In the case of 'yes $(yes no)' do you even try to run the second yes (with a truncated or empty argument list) or do you fail the entire command on the spot? There are arguments either way for much of this (and the choices interact with each other); you'll have to figure out what's the most useful answers in practice, whether your proposed change breaks any existing script practices, and then how much POSIX lets you get away with (if you care about being a POSIX-compatible Bourne shell; things like zsh have it easier here).

Then there's the issue of what the limit should be. We don't want to just limit the shell to the kernel's exec() limit (which is on the combined size of the arguments and the environment); it's valid to simply read a lot of output into an unexported shell variable and then process it. In fact in some situations this is how you deal with an 'arguments too big' problem. So what do you set? People are probably going to complain about almost any value.

(I suppose the real answer is to have the limit be user-settable. You could even start out with the limit available but default to unlimited, then a year or two later introduce a default limit.)

Sidebar: $() and set -e today

Since I just experimented with this:

set -e
echo 1 $(false)
v=$(false)
echo 2

This will echo '1' but not '2' in Bash, dash, ksh, FreeBSD's sh, and Solaris 10's /usr/xpg4/bin/sh, which makes me assume that this is actually what POSIX requires. Just to be different, Solaris 10's /bin/sh doesn't echo anything (even if you flip the order around).

(I have not been masochistic enough to obtain and boot a PDP 11 V7 image just to see what the V7 sh would do, but I suspect it's the same as Solaris /bin/sh.)


Comments on this page:

From 68.87.42.115 at 2013-03-15 05:57:32:

Back in the 90's, we had a student on a university web server create and advertise a file called some name like 'amazingporn.gif'

It turned out to be a symlink to /dev/zero (an infinite supply of zeros, very useful for zeroing out blocks of memory before you use them, etc.)

As far as we can tell, people would fire up NCSA Mosaic or whatever and go to that page. They'd sit there as it downloaded, downloaded, downloaded -- no doubt thinking "This is a huge file, it's going to be great!" -- until their browser filled up their PC's memory, and the whole system would crash.

Ah, those were the days. :)

From 208.44.138.156 at 2013-03-15 10:33:08:

I seem to remember Linux having a max argument length about ten years ago. (For all I know, that was just the default limit on RH.) Then, people got excited when the limit was lifted, which resulted in xargs becoming even less used.

Aha! This Linux Journal article from 2002 agrees with me :) http://www.linuxjournal.com/article/6060

- Josef 'Jeff' Sipek.

From 198.189.14.2 at 2013-04-18 11:57:35:

I tested V7 /bin/sh and it does in fact produce no output.

(I used SIMH 3.8-1 and the V7 Unix image available at http://simh.trailing-edge.com/kits/uv7swre.zip )

Written on 15 March 2013.
« What I want out of a web-based syndication feed reader
I'm giving up on upgrading my laptop from Fedora 14 »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Mar 15 00:13:17 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.