Installing narf

narf, version 0.93.01
cleanfeed-inn 0.95 modified for MD5 body hashes (or as patches against 0.95)
our filter (last updated December 9 1997)

To start with you need a copy of narf and a perl filter for it. Narf is somewhat tuned for our filter (and has only been run with it), but with minor if any customization it should work with any INN perl filter (a good source of links to them is Jeremy Nixon's Anti-Spam Software page). Unlike perl INN filters, narf's filters can use all of perl's features; in particular, they can use use to load modules. Narf requires perl 5.004. Our filter and the MD5-modified cleanfeed-inn both also require an installed copy of the perl MD5 module from CPAN.

You should also have read the main narf page, particularly the how it works section.

Editing narf's configuration information

Narf needs to know the paths to various directories and files. Most important is $FILTDIR, the directory where the filter_innd.pl filter is; this must be set in narf itself and cannot be set at runtime by a command-line argument. Although you can set most others by command-line arguments (see the comments at the start of the script for details), you probably should set your normal defaults in the script.

There are a number of other defaults and options to be set in the configuration section. You should pay particular attention to the normal value of $verbose, since logging accepted articles can create very big logs. Our logs often hit sixty or seventy megabytes a day and while we find the information useful you may opt for a slimmer log.

Make sure that the file you have chosen for narf's saved list of recognized EMP signatures is writeable by your news user. If it is not, no recognized EMP signatures will be saved across runs. If the file doesn't exist yet, make sure that the news user can create the file in that directory; you may want to touch and chown the file the first time around.

If you will be feeding posts by local users through narf, make sure that $DUMPDIR is defined and is a directory that the news user can create files in.

Reconfiguring your NNTP daemon

Reconfigure your NNTP daemon to write incoming batches and articles into the source batch directory you've editing into narf. If you already have local posts filtered through some spam checking NNTP filter, you may want to arrange for it to write POST transfers from your users into a different spot than IHAVE transfers from your peers. Narf does an acceptable job of filtering local posts if configured correctly, and we do strongly recommend that you filter local posts somehow.

If narf rejects a post that it identifies as local, it always saves a copy of the rejected article in $DUMPDIR under a naming scheme designed to make such saved articles be easily spotted. This does rely on $DUMPDIR being configured; if you are going to feed local posts through narf, we strongly recommend that you do this. Because of how narf works, there is no indication to the user posting that their article has been rejected; all they will see is that their articles are not showing up.

Pick a filter

Narf requires you to provide a filter to run articles through to determine whether to accept or reject them. Narf can use INN perl filters unmodified, although it will perform better if you modify either it or the filter or both. A discussion on how to write an INN perl filter is beyond the scope of this document; if you are interested, start from one of the existing ones. The filter you choose must be installed as filter_innd.pl in $FILTDIR.

Our filter is oriented towards aggressive spam discarding, including lots of judgement calls about things we don't want. It's suitable mainly for people who want to be as aggressive as us or want to see what can be done.

Jeremy Nixon's cleanfeed filter is designed to be more safe in rejecting only things that are pretty certain to be spam (although still healthily aggressive); a discussion of what it rejects is here. We recommend that you use our small revision of it to have it use MD5 body hashes as its primary method of detecting new spam, since this seems to work very well; you can get that version here (0.95 based version), or fetch a current copy of cleanfeed-inn and see if our diffs apply cleanly. We have successfully run narf with both this filter and an unmodified cleanfeed 0.95 in tests but do not use either in production (since we prefer our own filter).

Examine the policies in your chosen filter

Every filter and especially our filter contains policy decisions that you may not agree with. You should at least skim the source and make sure that you agree with everything it is doing and all the sorts of spam that it is killing. Your chosen filter may also contain options that you need to tweek, or sizing information for how much data it stores in memory. If you are using our filter (which we recommend, of course) then you probably want to read about why we have our filter do what it does.

If you do not do this and your chosen filter discards all of your newsfeed, you have only yourself to blame.

Edit narf if you are not using our filter

You should examine and if necessary edit narf if you are not using our filter. Potentially troublesome spots are its recognition of what spam rejections are safe to block cancels for and its method of saving any recognized EMP signatures your filter uses. If you are using a filter derived from cleanfeed-inn 0.95, both of these will work and you do not need to do anything; otherwise, see the narf source code.

There are a number of perl code tweaks that you can make to a filter to make what narf logs more interesting; generally they involve defining subroutes in the filter that narf calls. Subroutines to define:

&filter_bodyhash
Called to return a body hash of the article that will be logged for various purposes. Takes no arguments; returns either the hash or nothing.
&filter_logname
This routine is used to determine the filename in $DUMPDIR to save a rejected article to. It returns a string (null signals that the rejected article should not normally be saved) and is called with two arguments; the rejection reason that &filter_art returned, and the rnews batch source (usually the hostname received from). This routine should not attempt to prepend local- to the name of rejected local posts; narf does that on its own.
&filter_sync
Narf calls this before saving the recognized EMP signatures.

Do you want to reject cancels?

As shipped both narf and our filter default to rejecting cancels when they can. Fuller explanations of what they do this for is in their main pages, here for narf and here for our filter. If you do not want this behavior you need to change the setting of $cancreject in narf and $block_cancels in our filter. Changing only narf (or the filter) will not automatically also shut the other off.

Arrange to start narf on reboot

Narf should be run as your news user and it needs to be automatically restarted on reboot. Edit whatever startup scripts you have to arrange for this. When starting, narf's standard output and standard error should be redirected into a file. Anything that appears there is either a goof in editing your filter or a serious problem.

Optionally: get our log summary software

narfsum (last updated November 16 1997)
narfhippo (last updated November 22 1997)

If you intend to do anything with the log narf produces, you may want to start with our log summary software. Narfsum is a shell and awk script that produces our rejection volume reports, while narfhippo is a perl program that produces our rejection sources reports.

Narfsum was written by P Kern, while narfhippo (modeled on the SpamHippo reports) was written by Chris Siebenmann.

Next: narf operation

Before starting narf for the first time you should read about narf operation.

Further information

This page is part of our narf pages.



This page and much of our precautions are maintained by Chris Siebenmann, who hates junk email and other spam.