How I managed to shoot myself in the foot with my local DNS resolver
I have my home machine's Twitter client configured so that it opens
links in my always-running Firefox, and in fact there's a whole
complicated lashup of shell scripting surrounding this in an attempt
to the right thing with various sorts of links. For the past little
while, clicking on some of those links has often (although not
always) been very slow to take effect; I'd click a link and it'd
be several seconds before I got my new browser window. In the
beginning I wrote this off as just Twitter being slow (which it
sometimes is) and didn't think too much about it. Today this got
irritating enough that I decided to investigate a bit, so I ran
against twitter.com, expecting to see that all the delay was in
either connecting to Twitter or in getting content back.
(To be honest, I expected that this was something to do with IPv6, as has happened before. My home IPv6 routing periodically breaks or malfunctions even when my IPv4 routing is fine.)
To my surprise, httpstat reported that it'd spent just over 5000 milliseconds in DNS lookup. So much for blaming anyone else; DNS lookup delays are pretty much all my fault, since I run a local caching resolver. I promptly started looking at my configuration and soon found the problem, which comes in two parts.
First, I had (and have) my
/etc/resolv.conf configured with a
ndots setting and several search (sub)domains. This is
for good historical reasons, since it lets me do things like '
apps0.cs' instead of having to always specify the long fully
qualified domain. However, this means that every reasonably short
website name, like
twitter.com, was being checked to see if it
was actually a university host like
course it isn't, but that means that I was querying our DNS servers
quite a lot, even for lookups that I conceptually thought of having
nothing to do with the university.
Second, my home Unbound setup is basically a copy of my work Unbound setup, and when I set it up (and copied it) I deliberately configured explicit Unbound stub zones for the university's top level domain that pointed to our nameservers. At work, the intent of this was to be able to resolve in-university hostnames even if our Internet link went down. At home, well, I was copying the work configuration because that was easy and what was the harm in short-cutting lookups this way?
In case you are ever tempted to this, the answer is that you have
to be careful to keep your list of stub zone nameservers up to date,
and of course I hadn't. As long as my configuration didn't break
spectacularly I didn't give it any thought, and it turned out that
one of the IP addresses I had listed as a
stub-addr server doesn't
respond to me at all any more (and some of the others may not have
been entirely happy with me). If Unbound decided to send a query
twitter.com.utoronto.ca to that IP, well, it was going to be
waiting for a timeout. No wonder I periodically saw odd delays like
this (and stalls when I was trying to pull from or check
and so on).
(Twitter makes this much more likely by having an extremely short TTL on their A records, so they fell out of Unbound's cache on a regular basis and had to be re-queried.)
I don't know if short-cut stub zones for the university's forward
and reverse DNS is still a sensible configuration for my office
workstation's Unbound, but it definitely isn't for home usage. If
the university's Internet link is down, well, I'm outside it at
home; I'm not reaching any internal servers for either DNS lookups
or connections. So I've wound up taking it out of my home configuration
utoronto.ca names up just like any other domain.
(This elaborates on a Tweet of mine.)
Sidebar: The situation gets more mysterious
It's possible that this is actually a symptom of more than me just setting up a questionable caching DNS configuration and then failing to maintain and update it. In the process of writing this entry I decided to take another look at various university DNS data, and it turns out that the non-responding IP address I had in my Unbound configuration is listed as an official NS record for various university subdomains (including some that should be well maintained). So it's possible that something in the university's DNS infrastructure has fallen over or become incorrect without having been noticed.
(I wouldn't say that my Unbound DNS configuration was 'right', at least at home, but it does mean that my configuration might have kept working smoothly if not for this broader issue.)