We've switched (back) to using Bind for our local DNS resolvers

December 12, 2023

As part of our local network environment, we have some local DNS resolvers that people here use (or at least are supposed to use). These resolvers handle multiple jobs; they resolve our own normal DNS names (or some of them), our internal only DNS names, and handle all of the recursion for lookups for external names. Originally we ran these resolvers using Bind on OpenBSD. When OpenBSD stopped supporting Bind, we switched to a setup using Unbound and NSD. We needed NSD as well as Unbound because we wanted our resolvers to have a full copy of our local zones, so they wouldn't need our master DNS server to be up to answer those names. The local NSD was the authoritative secondary for our DNS zones, and the local Unbound knew to query it for them.

Unfortunately, we've recently had a variety of problems with this OpenBSD Unbound configuration that resulted in a series of serious DNS resolution failures. We tried some configuration shuffles like splitting out critical machines to a dedicated DNS resolver and uncertain ratelimit tuning, but we weren't happy with them and didn't really have confidence that they'd solve our problems. So to deal with these issues in a way we were more confident with, we switched over to using Bind on Ubuntu.

(We switched what is usually the less used DNS resolver over to Bind a week ago, and the more used one today.)

I'm not going to claim that Bind is the right answer for everyone; in general Unbound is a perfectly fine recursive resolver and I run it on my own machines (usually without problems). The advantage of Bind in our environment is that Bind has solid support for combining recursive DNS resolution with being an authoritative secondary for some zones, and we know how to configure this so that it works (and interacts smoothly with our Bind-based stealth master DNS server). The one area that Bind falls short in is ratelimits that are focused on recursive resolvers instead of authoritative servers, but we put a test Bind install through load tests and it held up fine (under conditions that had generally caused our Unbound servers to stop responding).

(Unbound originally had no support for acting as a secondary this way. Current versions of Unbound appear to have support for some form of it, but every time I read the "Authority Zone Options" section of unbound.conf(5) my head hurts and I'm left uncertain about what settings we'd actually want to set. We know exactly how to set up Bind to do what we want.)

Switching to Ubuntu also has some pragmatic advantages, since we already run a lot of Ubuntu machines and have a lot of tools for dealing with them, including monitoring and metrics through our Prometheus environment. OpenBSD has only limited support for the Prometheus host agent, never mind other agents we might want to run. And Ubuntu LTS releases have longer support periods than OpenBSD nominally does, although OpenBSD's short support periods mostly don't matter to us.

(Even if we switch back to Unbound someday, I suspect that we might well run Unbound on Ubuntu instead of returning to OpenBSD. Our usage of OpenBSD is slowly but steadily shrinking down to mostly firewalls, where PF is still by far our favorite firewall system.)

Written on 12 December 2023.
« Seeing how fast people will probe you after you get a new TLS certificate
Why systemd-resolved can give weird results for nonexistent bare hostnames »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Dec 12 22:54:20 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.