The DNS TTL problem

March 26, 2014

It all started with a tweet by @Twirrim:

DNS TTL records exist for a reason. For the love of all that is holy, honour them. Don't presume to think you know better.

On the one hand, as a sysadmin I'm in full agreement with this view. I certainly want all of the DNS caches and recursive DNS servers out there to respect the TTLs we set on our DNS entries and it makes me irritated when people don't. On the other hand I also have to sympathize with the operators of DNS caches out there, because I rather suspect that there are a huge number of mis-set TTLs in practice.

The problem with DNS TTLs is that they are almost always an example of information that doesn't have to be correct, and we all know what eventually happens to such information. Most people's DNS entries change very rarely and are not looked up in any huge volume, so it doesn't really matter what TTLs they have. If they have the minimum TTL you won't notice the extra lookup volume and if they have an absurdly long TTL you won't notice the lingering old entries because you aren't changing your DNS entries anyways.

(And I'm not throwing stones here. We have a number of DNS entries with short TTLs that haven't changed for years in our zones, more or less just because. It would take work to go back through our zones, find them all, verify that we really don't need short TTLs any more, and take them out. It's simpler to let them sit there and it doesn't do us any harm.)

But I bet that operators of large scale DNS caches notice those things. I rather suspect that they get customer complaints when someone updates their DNS except that they had really long TTLs on the old entries and now the customers can't get to the new servers because the old entries are stilled cached. And I suspect that they notice the extra load from short TTLs forcing useful DNS entries to be discarded even when said DNS entries haven't actually been changed in the past year. I also suspect that there are more people doing DNS TTLs somewhat wrong than there are people doing them completely right. So I can see the engineering logic in overriding DNS TTLs in your large scale cache, however inconvenient it is for me as a sysadmin.

I don't have any answers to this and in a sense there are no answers. By that I mean that the large scale DNS caches that are currently monkeying around with people's DNS TTLs are not going to change their behavior any time soon, so the most I can do is live with it.

(Then there is the thornier issue of DNS lookups being remembered by long running programs that may have no idea of TTLs at all; instead they did a getaddrinfo() once and have held on to the result ever since. I suspect that web browsers no longer fall into this category, although they once did.)

Written on 26 March 2014.
« The importance of having full remote consoles on crucial servers
Why people keep creating new package managers »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Mar 26 00:54:49 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.