2007-10-11
A gotcha with command order in pipes
Here's a mistake I've made more than once:
tail -f active-log | grep big-filter | grep -v one-thing
(Here the big-filter
is something that selects only a small amount
of actual interesting logfile output, and then you want to throw away
a small bit more. For example, my most recent version of this was
monitoring the exim log on our new mail system for failures and other
anomalies, and then wanting to throw away one particular known failure.)
Why is this a mistake? Because you probably want to get interesting output as fast as possible, and this order doesn't do that; instead you get periodic large spurts of (delayed) output.
This happens because the big grep filter is only producing output periodically, and that happens because feeding grep's output into a pipe makes it block-buffered instead of line-buffered. So instead of immediately writing out any line that makes it past the filter, it sits around waiting for a buffer's worth, which may take a while.
(For GNU grep specifically you can use the --line-buffered
option, but not all filtering tools have an equivalent one.)
The golden rule of pipelines like this is put the small volume reductions first and the big volume reductions last. This keeps as much volume as possible moving through each stage of the pipeline, so each stage flushes its output buffers as fast as possible. So in this case the correct order is:
tail -f active-log | grep -v one-thing | grep big-filter
The tail -f
here will produce enough volume that the first grep is
constantly sending output to the big filter at the end, and the big
filter's output is going to your terminal so it's line-buffered.
(You do not need to worry about tail -f
's buffering; tail
always
writes anything new it finds, even if it is going to a pipe. Or at
least sane versions of tail
do, including the GNU one.)