Internet routing can now vary based on things you wouldn't expect

March 27, 2021

Today Toronto had a little issue with Cloudflare, which gave me a chance to learn a useful lesson about the modern Internet and how it routes traffic. The summary of the lesson is that the venerable Unix traceroute command may not be your friend any more.

I come from an era of a relatively simple Internet. Back then, the path that your packets took through the network was expected to depend only on the destination and the source IPs. Things in the middle might drop some traffic or filter parts of it out, but the path was the same whether you were using ICMP, UDP, or TCP, and regardless of what port TCP or UDP port you were connecting to. In this environment, ping and traceroute were reliable diagnostics in general; if routes weren't flapping, traceroute would tell you the path that all of your traffic was using, while ping could tell you that the target host was there.

(If something pinged but didn't respond to the port you wanted, it was a firewall issue.)

The Cloudflare issue today did not behave like that. In particular, plain traceroute reported one path, a short five-hop one, while 'traceroute -T -p 443' reported a rather different ten-hop path that seemed to take a detour off to Chicago before coming back to Toronto (and not reaching the target Cloudflare IP). At one level, port-based routing makes a certain amount of sense; it's a lower level version of application load balancers, and why go to all the bother of doing complicated things just to reject UDP packets that you don't handle. At another level it makes troubleshooting and testing more complicated, especially for outside people. ICMP, random UDP traffic, and actual TCP traffic to specific ports (or emulations of it) may go to completely different places, so information gathered in one way for one of them doesn't necessarily apply to anything else.

Fortunately not everything is like this. Unfortunately the people who are most likely to be like this are the large cloud providers and CDNs, and those collectively host a lot of websites and places of interest (and their complexity provides more room for subtle problems).

For myself, my lesson learned from this is that if I'm trying to check out the network path to some outside place, I should use 'traceroute -T -p 443' (or the applicable port, but HTTPS is the most likely). Once HTTP/3 becomes common, I'll potentially also want to check with UDP port 443 (although that gets complicated fast). Plain ping and traceroute are not as trustworthy as they used to be any more.

Comments on this page:

From at 2021-03-28 05:12:34:

One thing about traceroute and mtr that already existed before Cloudflare is that some ISPs use ECMP routing (equal cost multipath), where the same destination just has more than 1 gateway of identical preference. The gateway for each packet is then chosen by hashing layer-3 and often layer-4 headers. It isn't port-based routing, exactly, but the result is influenced by them.

(In Linux terms, they might have something like ip route add default nexthop via nexthop via etc.)

Which means, in the default traceroute mode where it probes successively higher UDP ports for each hop, it may actually end up choosing a different path for each probe, and may show nonsensical results with both paths mixed in. (In mtr this is especially obvious.) Specifying ICMP or at least a fixed UDP probe port then becomes necessary to avoid this.

From at 2021-03-28 05:15:48:

Oh, and this obviously also means that your source IP has an effect as well. For example, odd IP addresses may hash down to one upstream and even IP addresses to another.

So if you're figuring out connectivity issues on one host x.y.z.6 but run a traceroute from an adjacent host x.y.z.7 (e.g. because that's what you happened to have a ssh session to), it may well be taking a different path in some situations.

[A] traceroute from an adjacent host […] may well be taking a different path in some situations.

That seems a given, and an always-been-a-given at that, no? Any time you run a traceroute from somewhere else, you might get results that don’t apply to your host of interest, if only for such mundane reasons as diverging networking configurations. So even with instincts harking from a simpler time on the internet, you would expect or at least not be surprised by that.

The newer reality Chris points out, that even traceroute from the same machine is increasingly likely to produce inapplicable and misleading results, is rather more dismaying.

By cks at 2021-03-29 10:07:40:

I think that sysadmins have generally assumed that traceroutes from hosts in the same /24 or perhaps more generally the same ASN would be routed more or less the same way (I certainly have). ECMP makes this not true even for adjacent IPs on the same /24.

As a pragmatic example, we had a puzzling case where Facebook's DNS servers gave different answers to different /24s within the university. It didn't occur to me until now that ECMP is one possible explanation of this.

(Now that I check, this DNS separation is still happening.)

From at 2021-03-31 06:43:27:

As a pragmatic example, we had a puzzling case where Facebook's DNS servers gave different answers to different /24s within the university. It didn't occur to me until now that ECMP is one possible explanation of this.

Hmm, if you mean your own recursors from different /24s were talking to different Facebook authoritative servers, it might be the case due to ECMP.

But this also reminds me of the "EDNS Client Subnet" extension that DNS resolvers now often add to forwarded queries, so that even if the entire university shares the same set of resolvers, they might still inform upstream authoritative servers about which /24 each query came from.

By cks at 2021-03-31 11:16:19:

There were separate DNS resolvers on the different subnets. I also tested this with direct dig queries to Facebook's listed NS servers and they answered this way (of course, direct queries expose the subnet). However, EDNS Client Subnet does create an interesting test: I can see if supplying a different network's subnet changes the answer that Facebook gives. The answer is yes; a query from our subnet with 'dig +subnet=<other>/24' gets a different answer than without +subnet. So this appears to be Facebook deciding that different subnets here get different servers, not ECMP diverting traffic to different DNS servers that give us different answers.

By cks at 2021-03-31 11:23:11:

In an additional surprise, the mere presence of a +subnet argument to dig appears to change what answer I get from Facebook's servers, even if I provide the same /24 as I'm querying from. This is slightly annoying as it means I can't map out the range that Facebook is supplying the different answer for.

My large scale lesson is that how big places do their DNS and what answers you'll get from them is both unpredictable and variable from outside. Facebook is clearly doing something, but only they know exactly what. The corollary of this is that if users have problems reaching some big outside site, we'd better ask them what IP address the site resolves to for them because it may not be the answer we get.

Written on 27 March 2021.
« The attractions of reading sensor information from IPMIs
What tool you use to read IPMI sensor information can matter »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Mar 27 23:51:55 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.