Parallelizing DNS queries with
The obvious approach was to write a SBL lookup program that internally worked in parallel, perhaps using threads. I was using Python and it has decent thread support, but when I started going down this route it rapidly started looking like too much work.
So instead I decided to use brute force and Unix. I had all of the IP addresses I wanted to look up in a big file, one IP address per line, so:
$ mkdir /tmp/sbl $ split -l 800 /tmp/ipaddrs /tmp/sbl/in-sbl. $ for i in /tmp/sbl/in-sbl.*; do \ o=`echo $i | sed 's/in-/out-/'`; \ sbllookup <$i >$o & \ done; wait $ cat /tmp/sbl/out-sbl.* >/tmp/sbl-out
What this does is that it takes
/tmp/ipaddrs, the file of all of the
IP addresses, and splits it up into a whole bunch of smaller
chunks. Once I had it in chunks, I could parallelize my DNS lookups
by starting the (serial) SBL lookup program on each separate chunk
in the background, letting 44-odd of them run at once. Each wrote
its output to a separate file, and once the
wait had waited for
them all to finish I could glue
/tmp/sbl/out-sbl.* back into
a single output file.
Parallelized, it took about five or ten minutes the first time around, and then only a minute or so for the second pass. (I did a second pass because the replies from some DNS queries might have been late trickling in the first time; the second time around they were all in our local DNS cache.)