A gotcha with command order in pipes

October 11, 2007

Here's a mistake I've made more than once:

tail -f active-log | grep big-filter | grep -v one-thing

(Here the big-filter is something that selects only a small amount of actual interesting logfile output, and then you want to throw away a small bit more. For example, my most recent version of this was monitoring the exim log on our new mail system for failures and other anomalies, and then wanting to throw away one particular known failure.)

Why is this a mistake? Because you probably want to get interesting output as fast as possible, and this order doesn't do that; instead you get periodic large spurts of (delayed) output.

This happens because the big grep filter is only producing output periodically, and that happens because feeding grep's output into a pipe makes it block-buffered instead of line-buffered. So instead of immediately writing out any line that makes it past the filter, it sits around waiting for a buffer's worth, which may take a while.

(For GNU grep specifically you can use the --line-buffered option, but not all filtering tools have an equivalent one.)

The golden rule of pipelines like this is put the small volume reductions first and the big volume reductions last. This keeps as much volume as possible moving through each stage of the pipeline, so each stage flushes its output buffers as fast as possible. So in this case the correct order is:

tail -f active-log | grep -v one-thing | grep big-filter

The tail -f here will produce enough volume that the first grep is constantly sending output to the big filter at the end, and the big filter's output is going to your terminal so it's line-buffered.

(You do not need to worry about tail -f's buffering; tail always writes anything new it finds, even if it is going to a pipe. Or at least sane versions of tail do, including the GNU one.)

Written on 11 October 2007.
« How to properly look up hostnames from IP addresses
Getting your networks to your racks »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Oct 11 22:15:37 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.