A good solution to our Unbound caching problem that sadly won't work
In response to my entry on our Unbound caching problem with local zones, Jean Paul Galea left a comment with the good suggestion of running two copies of Unbound with different caching policies. One instance, with normal caching, would be used to resolve everything but our local zones; the second instance, with no caching, would simply forward queries to either the authoritative server for our local zones or the general resolver instance, depending on what the query was for.
(Everything would be running on a single host, so the extra hops queries and replies take would be very fast.)
In many organizational situations, this is an excellent solution. Even in ours, at first glance it looks like it should work perfectly, because the issue we'd have is pretty subtle. I need to set the stage by describing a bit of our networking.
In our internal networks we have some machines with RFC 1918 addresses that need to be publicly reachable, for example so that research groups can expose a web server on a machine that they run in their sandbox. This is no problem; our firewalls can do 'bidirectional NAT' to expose each such machine on its own public IP. However, this requires that external people see a different IP address for the machine's official name than internal people do, because internal people are behind the BINAT step. This too is no problem, as we have a full 'split horizon' DNS setup.
So let's imagine that a research group buys a domain name for some
project or conference and has the DNS hosted externally. In that
domain's DNS, they want to
CNAME some name to an existing BINAT'd
server that they have. Now have someone internally do a lookup on
that name, say 'www.iconf16.org':
- the frontend Unbound sees that this is a query for an external name, not one of our own zones, so it sends it to the general resolver Unbound.
- the general resolver Unbound issues a query to the iconf16.org nameservers and gets back a CNAME to somehost.cs.toronto.edu.
- the general resolver must now look up somehost.cs itself and will wind up caching the result, which is exactly what we want to avoid.
This problem happens because DNS resolution is not segmented. Once we hand an outside query to the general resolver, there's no guarantee that it stays an outside query and there's no mechanism I know of to make the resolving Unbound stop further resolution and hot-potato the CNAME back to the frontend Unbound. We can set the resolving Unbound instance up so that it gives correct answers here, but since there's no per-zone cache controls we can't make it not cache the answers.
This situation can come up even without split horizon DNS (although split horizon makes it more acute). All you need is for outside people to be able to legitimately CNAME things to your hosts for names in DNS zones that you don't control and may not even know about. If this is forbidden by policy, then you win (and I think you can enforce this by configuring the resolving Unbound to fail all queries involving your local zones).