Wandering Thoughts archives

2007-10-11

A gotcha with command order in pipes

Here's a mistake I've made more than once:

tail -f active-log | grep big-filter | grep -v one-thing

(Here the big-filter is something that selects only a small amount of actual interesting logfile output, and then you want to throw away a small bit more. For example, my most recent version of this was monitoring the exim log on our new mail system for failures and other anomalies, and then wanting to throw away one particular known failure.)

Why is this a mistake? Because you probably want to get interesting output as fast as possible, and this order doesn't do that; instead you get periodic large spurts of (delayed) output.

This happens because the big grep filter is only producing output periodically, and that happens because feeding grep's output into a pipe makes it block-buffered instead of line-buffered. So instead of immediately writing out any line that makes it past the filter, it sits around waiting for a buffer's worth, which may take a while.

(For GNU grep specifically you can use the --line-buffered option, but not all filtering tools have an equivalent one.)

The golden rule of pipelines like this is put the small volume reductions first and the big volume reductions last. This keeps as much volume as possible moving through each stage of the pipeline, so each stage flushes its output buffers as fast as possible. So in this case the correct order is:

tail -f active-log | grep -v one-thing | grep big-filter

The tail -f here will produce enough volume that the first grep is constantly sending output to the big filter at the end, and the big filter's output is going to your terminal so it's line-buffered.

(You do not need to worry about tail -f's buffering; tail always writes anything new it finds, even if it is going to a pipe. Or at least sane versions of tail do, including the GNU one.)

PipeOrdering written at 22:15:37; Add Comment

2007-10-09

A silly trick with X

Once upon a time, I was at home and really needed to see something that was displayed on my screen at work. This being Unix, there was no convenient remote desktop add-on that would have let me mirror my work display to home, but this being Unix, there are ways around that.

(Also, I was connecting over a slow dialup PPP connection, so a live remote desktop thing would have been difficult anyways.)

My first attempt was simple; ssh in to the work machine and do DISPLAY=:0 xwd -out /tmp/screen.xwd. This generated a large image that was unfortunately full of a lot of black, because I was running a screen locker, and the X screen locker actually puts a real window up on top of everything on the screen, instead of having a special side channel into X.

No problem; this is Unix and I had a hammer:

  • ssh in to work
  • export DISPLAY=:0
  • kill the xlock process
  • xwd the now revealed screen, complete with the window I needed to read
  • run nohup xlock >/dev/null 2>&1 </dev/null & to re-lock my screen

(After all, xlock doesn't care exactly where it's being started from; all it needs is a $DISPLAY and enough privileges on the display.)

Technically I believe I didn't use xwd, but I've now forgotten whatever screen dump program it was. These days I would use import from ImageMagick; however much ImageMagick makes me wince vaguely with its swiss army chainsaw approach to life, it's convenient.

XTrickI written at 23:00:12; Add Comment

2007-10-02

A gotcha with 'bidirectional' pipes to commands

By bidirectional pipes I mean a situation where you start a subordinate program and both write to its standard input and read from its standard output. There are a fair number of programs that are programmed like the following pseudo-code:

to, from = pipecmd("subprogram", "rw")
write(to, stuff)
reply = read(from)

People write these programs, test them a bit, start using them, have them work, and then one day wake up to discover that their program has locked up; it and the subprogram are both running, doing nothing. (A system call tracer will tell them that both programs are blocked in write().)

It's not really surprising that people write pipes like this; this code is simple, obvious, and works most of the time. Specifically, it works as long as the subprogram reads almost all of its input before producing enough output to fill up its standard output pipe. After that the subprogram blocks trying to write output, because no one is reading it, and the main program blocks trying to write input to the subprogram, because the subprogram is blocked writing output.

(Often this means that the program works in testing and early in its production use, when it is only being asked to deal with relatively small amounts of data, and it is only when things grow that it blows up.)

Unfortunately there is no good simple way out of this. The best but most complicated approach is to make your program use a select() or poll() based loop to simultaneously write to the subprogram and read its output. The brute force way is to use either fork() or threads to let you do the write() and the read() at the same time.

PipeReadWriteIssue written at 20:54:26; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.