Wandering Thoughts archives

2007-10-11

A gotcha with command order in pipes

Here's a mistake I've made more than once:

tail -f active-log | grep big-filter | grep -v one-thing

(Here the big-filter is something that selects only a small amount of actual interesting logfile output, and then you want to throw away a small bit more. For example, my most recent version of this was monitoring the exim log on our new mail system for failures and other anomalies, and then wanting to throw away one particular known failure.)

Why is this a mistake? Because you probably want to get interesting output as fast as possible, and this order doesn't do that; instead you get periodic large spurts of (delayed) output.

This happens because the big grep filter is only producing output periodically, and that happens because feeding grep's output into a pipe makes it block-buffered instead of line-buffered. So instead of immediately writing out any line that makes it past the filter, it sits around waiting for a buffer's worth, which may take a while.

(For GNU grep specifically you can use the --line-buffered option, but not all filtering tools have an equivalent one.)

The golden rule of pipelines like this is put the small volume reductions first and the big volume reductions last. This keeps as much volume as possible moving through each stage of the pipeline, so each stage flushes its output buffers as fast as possible. So in this case the correct order is:

tail -f active-log | grep -v one-thing | grep big-filter

The tail -f here will produce enough volume that the first grep is constantly sending output to the big filter at the end, and the big filter's output is going to your terminal so it's line-buffered.

(You do not need to worry about tail -f's buffering; tail always writes anything new it finds, even if it is going to a pipe. Or at least sane versions of tail do, including the GNU one.)

unix/PipeOrdering written at 22:15:37;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.