Parallelizing DNS queries with split
So there I was the other day, with 35,000 IP addresses to look up in the SBL to see if they were there. Looking up 35,000 IP addresses one after the other takes a long time. Too long a time.
The obvious approach was to write a SBL lookup program that internally worked in parallel, perhaps using threads. I was using Python and it has decent thread support, but when I started going down this route it rapidly started looking like too much work.
So instead I decided to use brute force and Unix. I had all of the IP addresses I wanted to look up in a big file, one IP address per line, so:
$ mkdir /tmp/sbl $ split -l 800 /tmp/ipaddrs /tmp/sbl/in-sbl. $ for i in /tmp/sbl/in-sbl.*; do \ o=`echo $i | sed 's/in-/out-/'`; \ sbllookup <$i >$o & \ done; wait $ cat /tmp/sbl/out-sbl.* >/tmp/sbl-out
What this does is that it takes /tmp/ipaddrs
, the file of all of the
IP addresses, and splits it up into a whole bunch of smaller
chunks. Once I had it in chunks, I could parallelize my DNS lookups
by starting the (serial) SBL lookup program on each separate chunk
in the background, letting 44-odd of them run at once. Each wrote
its output to a separate file, and once the wait
had waited for
them all to finish I could glue /tmp/sbl/out-sbl.*
back into
a single output file.
Parallelized, it took about five or ten minutes the first time around, and then only a minute or so for the second pass. (I did a second pass because the replies from some DNS queries might have been late trickling in the first time; the second time around they were all in our local DNS cache.)
|
|