The additional complications in DNS updates that secondary DNS servers add

June 20, 2020

I was recently reading Julia Evans' What happens when you update your DNS? (which is a great clear explanation of what it says), and it brought back some painful memories of the old days (which are still the current days for some people), which I might as well share.

Today, most DNS services that people deal with are managed DNS providers. When you enter a DNS update into your DNS provider's API or website for this, magical things happen behind the scenes in the DNS provider's infrastructure and your update normally goes live more or less immediately on all of the authoritative DNS servers involved in answering queries for your domain. In this environment, where your changes appear on your authoritative DNS servers effectively instantly, the only thing that matters for how fast your changes are visible is how long the various recursive DNS servers on the Internet have cached your existing information, as Julia Evans covers.

However, authoritative DNS servers didn't originally work that way and even today things don't necessarily quite work out that way if you run your own DNS service using straightforward DNS servers like NSD or the venerable Bind. The original DNS server environment had the idea of primary and secondary authoritative DNS servers. The primary DNS servers got all of the data for your zone from files on their disk (or more recently perhaps from a database or some network data source), and the secondary DNS servers got the data for your zone by copying it from a primary DNS server (possibly one that wasn't advertised publicly, which is often called a 'stealth master'), generally with an AXFR. Effectively your secondary authoritative DNS servers were (and are) a cache.

(You could have multiple primary servers, at which point it was up to you to make sure they were all using the same DNS zone data. The very simple way to do this was to rsync the data files around to everyone before having the DNS servers reload zones.)

Any time that you have what is effectively a cache, you should be asking about cache invalidation and refreshing; DNS servers are no exception. The original answer to this is in the specifications of the DNS SOA record, which has (zone) refresh, retry (of a failed refresh), and expire times, and a zone serial number so that secondaries could tell when their copy of the zone was out of date compared to the DNS primary. Every refresh interval, a secondary would check the SOA serial number on its primary and fetch an update if necessary. If it couldn't talk to the primary for long enough, it would declare the zone stale and stop answering queries from its cached data.

This meant that DNS updates had two timers on their propagation around, once you made them. First they had to propagate from the primary to all of the secondaries, which was based on the SOA refresh time. Once all secondaries were answering queries using the new DNS data, recursive DNS servers could still have old queries cached for up to the query TTL. In the worst case, where you make a change just after a refresh and a recursive DNS server queried your last secondary just before its refresh timer went off, your update might not reach everyone until the sum of the entry's TTL and the zone's SOA refresh.

(Adding a new DNS record could have a similar delay but here the first time was the SOA minimum value, which in theory set the TTL for negative replies. More or less.)

Having to wait for secondary DNS servers to hit their refresh timers to update has various issues. Obviously it slows down DNS updates, but it also means that there's a potentially significant amount of time when your various authoritative DNS servers are giving different answers to queries. All of this was recognized relatively early on and led to RFC 1996, which created the DNS NOTIFY mechanism, which lets primary servers send a special DNS NOTIFY message to secondaries.

When you update your primary servers, they signal the secondary servers that a zone change has (probably) happened. Generally the secondaries will then immediately try to transfer the updated zone over so they can use it to answer queries. A DNS NOTIFY doesn't guarantee that the secondaries are promptly up to date, but it makes it much more likely, and there is some protection against the NOTIFY being dropped in transit between the primary and the secondaries. In practice this seems to work fairly well, especially in network environments where the primaries and secondaries are close to each other (in network terms). However it's still not guaranteed, so if you have a monitoring system, it's worth having a check for the SOAs on your zones not being out of sync for too long between your primaries and secondaries.

(DNS providers hopefully have similar internal monitoring.)

Normally your primary DNS server software will automatically send out DNS NOTIFY messages to appropriate secondary servers if you tell it to reload things. You can generally manually trigger sending them even without a zone change or reload; one use of this is making sure that a particular secondary (or all of them) gets a little prod to try doing an update.

PS: Since we run our DNS ourselves here, this whole area remains an issue that we have to think about and remember some aspects of. But that's another entry.

PPS: Usually secondary servers have restrictions on who they'll accept DNS NOTIFY messages from, and I believe the messages can optionally be authenticated in some way these days.


Comments on this page:

There are various services out there to verify what the Internet-at-large sees for a particular record. I've been using the following recently to check propagation after doing changes:

It allows for drilling down into particular continents and countries.

By nanaya at 2020-06-21 02:10:37:

SOA record is also used by caching resolvers to determine the TTL of NXDOMAIN responses.

Written on 20 June 2020.
« Removing unmaintained packages from your Fedora machine should require explicitly opting in
In Go, the compiler needs to know the types of things when copying values »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Jun 20 19:19:05 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.