== Parallelizing DNS queries with _split_ So there I was [[the other day|../spam/SBLProblemSources]], with 35,000 IP addresses to look up in the [[SBL|http://www.spamhaus.org/sbl/]] to see if they were there. Looking up 35,000 IP addresses one after the other takes a long time. Too long a time. The obvious approach was to write a SBL lookup program that internally worked in parallel, perhaps using threads. I was using Python and it has decent thread support, but when I started going down this route it rapidly started looking like too much work. So instead I decided to use brute force and Unix. I had all of the IP addresses I wanted to look up in a big file, one IP address per line, so: $ mkdir /tmp/sbl $ split -l 800 /tmp/ipaddrs /tmp/sbl/in-sbl. $ for i in /tmp/sbl/in-sbl.*; do \ o=`echo $i | sed 's/in-/out-/'`; \ sbllookup <$i >$o & \ done; wait $ cat /tmp/sbl/out-sbl.* >/tmp/sbl-out What this does is that it takes _/tmp/ipaddrs_, the file of all of the IP addresses, and splits it up into a whole bunch of smaller chunks. Once I had it in chunks, I could parallelize my DNS lookups by starting the (serial) SBL lookup program on each separate chunk in the background, letting 44-odd of them run at once. Each wrote its output to a separate file, and once the _wait_ had waited for them all to finish I could glue ((/tmp/sbl/out-sbl.*)) back into a single output file. Parallelized, it took about five or ten minutes the first time around, and then only a minute or so for the second pass. (I did a second pass because the replies from some DNS queries might have been late trickling in the first time; the second time around they were all in our local DNS cache.)