2006-12-20
How many root DNS servers are reachable over Internet II?
After yesterday, I decided to spend some time working out how many of the root DNS servers are reachable over our Internet II link. The answer surprised me: 9 out of the 13 root nameservers are reachable over Internet II.
(The necessary disclaimer: your view of what is reachable over Internet II may well vary from ours due to routing and local policy issues.)
We prefer to send packets over our Internet II link, so working out this number was relatively simple. I got a list of the IP addresses of the root nameservers, then ran traceroute on each, set to stop after the hop that's different between our Internet II and our regular connection, then looked for ones that went through the I2 connection.
(To make it faster I used traceroute's -f option, which sets the
hop count to start at; this meant that I was looking only at the
specific hop that was different. Well, one of them; I would have
looked at the first hop that's different, except my version of
traceroute seems to refuse to stop at less than 6 hops out.)
The root nameservers that are not reachable over our Internet II link are:
| a.root-servers.net | 198.41.0.4 | VeriSign |
| c.root-servers.net | 192.33.4.12 | Cogent |
| e.root-servers.net | 192.203.230.10 | NASA |
| g.root-servers.net | 192.112.36.4 | US Department of Defense |
The NASA one surprises me a little bit, although it may be the result of routing policy choices.
2006-12-19
An Internet dependency
Our main Internet connection is effectively down at the moment, and has been for over an hour and a half by now. It's startling how much of my work and the little stuff I do, including things I do more or less to fidget, turns out to be in some way dependent on the Internet.
(I have a pile of little fidgets that I do just to fill time, things like checking every so often to see if the mail server is OK or if there are any new Fedora Core updates.)
It's a good thing that this happened at the end of the day on a slow day, because our campus DNS system is probably in the process of melting down as a result of this. There's two related reasons this happens:
- DNS servers with simple setups need to go to the root nameservers for
anything that's not already in the cache, including the nameservers
for our own domains. Although on-campus connectivity is fine, these
nameservers may not be able to do anything with it because they can't
do the DNS lookups they need because they don't know where to send
them.
- the campus caching nameservers are suddenly backlogging on queries that usually finished fast, because they can't reach pretty much any outside DNS servers. This slows down responses to queries about on-campus stuff; under enough load, the queries even start timing out. (Plus you get a bonus deadly spiral of retries putting even more load on already loaded servers.)
The net result is that even entirely on-campus activity has a habit of grinding to a halt during an Internet outage of any length of time. Personally, I find it an interesting illustration of how interdependent things can be under the hood. (But then I carefully configure my own caching nameservers to know about the campus primary servers.)
There is a bit of irony involved, too. We have two Internet connections, one to the general Internet and one to Internet-II, the high-speed academic network. Our Internet-II connection is fine, but we can hardly use it, because we can't look up the IP addresses of people on Internet-II, because most of the root nameservers aren't accessible via Internet-II.
(I think about five out of thirteen are Internet-II accessible, which is actually more than I expected. It's hard to check, since our connectivity just came back as I was writing this. I suppose that's ironic too.)
2006-12-12
A limitation of OpenBSD bridging NAT firewalls
We put a new SMTP frontend machine in front of our core mail server today to handle incoming email from the outside world (for the usual reasons: dealing with spam better, shielding the core server from six zillion zombies all trying to talk to it at once, etc).
Rather than wait for updated MX entries for all our domains to propagate around and for people to switch to them, we figured we could speed up the process by having our bridging firewall send connections for the core server's port 25 off to the new frontend machine instead. (This would also give us a quick way to back out in case there were problems: just kill the redirect. In fact we could do the redirect before we did the MX switch, as a test.)
OpenBSD makes this pretty simple: just do more or less
rdr on $INT_IF proto tcp from any to <core-smtp> port 25 -> <smtp-front>
So we put such a rule in and enabled it and our clever plan promptly ground to a halt; it didn't work. Trying to talk to port 25 on the core SMTP server stalled out, not even connecting; the new SMTP frontend wasn't seeing any packets from the attempted connections, but the packet filter wasn't rejecting or dropping anything as far as we could tell.
Our network topology looks something like this, from the outside world:
campus backbone → backbone router → bridging firewall → subnet A → core router → all other subnets, including subnet B.
The core mail server is on subnet A; the new SMTP frontend is on subnet B.
The problem is that the OpenBSD bridging firewall doesn't rewrite the destination Ethernet address of the packet when it NATs the destination IP address. Because the destination Ethernet address is unchanged, the packet is still going to go to whatever machine on the local subnet it was originally going to, whether or not this machine will actually accept packets for the new destination IP address.
If the original destination IP is off the subnet, the wire packet's original destination is our core router, and everything works out. However, if the original destination IP is on the subnet, the backbone router is sending the packet straight to its Ethernet address; rewriting the destination IP address merely causes the original destination machine to drop the packet.
(If the original destination IP address is entirely virtual and there is no machine answering ARP requests for it, the backbone router will never put the packet on the wire in the first place.)
So for the future, we will have to bear in mind an important limitation of bridging NAT: you cannot easily have the pre-NAT destination IP address be on the local subnet. Fortunately we have relatively little on subnet A anyways, although it's somewhat annoying to sort of 'lose' a /24 just to be the touchdown point for the campus backbone.
I can think of a couple of ways around this:
- enable IP forwarding on the original destination machine. This should make it be willing to reinject the 'misaddressed' post-NAT packet onto the wire, this time sending it to our core router. (I don't think one can do any evil with this in this specific case, as the core mail server has only one network interface.)
- persuade the backbone router that it is actually in a very tiny subnet with just it and the core router, or better yet that it has a 'point to point over Ethernet' connection to the core router. This is a little bit inefficient if we have a lot of external traffic to things on subnet A, since it goes across the wire twice, and I am not entirely sure that the backbone router would handle 'out of subnet' people sending it packets directly and ARPing for its Ethernet address.
- split subnet A into a bunch of sub-subnets, one of them very small and just for the backbone router. This is more complicated (more machines have to change their configuration) but 'proper' and we can still freely use most of the IP address space in subnet A. They can share a wire or be put on more VLANs.
(Another solution would be to rewrite the destination Ethernet address
alongside the destination IP address, but as far as I can see from the
OpenBSD 3.9 pf.conf manpage, there's no way to do that.)
2006-12-07
How not to set up your DNS (part 13)
In the traditional illustrated format:
; sdig ns aescorts.net.
ns1.bnmq.com.
ns2.bnmq.com.
; dig mx aescorts.net. @ns1.bnmq.com.
[...]
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
[...]
;; AUTHORITY SECTION:
. 1 IN SOA . abuse.opticaljungle.com. ...
That's an interestingly grandiose claim of authority bnmq.com is making there. (And also an interesting primary nameserver they claim the DNS root has.)
For bonus points, they actually return an A record for aescorts.net, although without the usual authority records that you'd expect. It's just queries for other records that they fail. I'm left wondering if the bnmq.com nameservers are actually some species of caching nameservers, and bits have fallen out of their caches and haven't been refreshed.
(Given everything else, we probably didn't want to get email from a domain called 'aescorts.net' anyways.)
2006-12-06
Setting up switches to avoid unwanted VLAN leakage
We made a small mistake around here a while back: we used VLAN ID 1 for a real VLAN. In fact, we used it for our 'management network' VLAN.
This is a mistake because many switches insist that all ports be untagged members of some VLAN, and they pick VLAN 1 to be the default VLAN for this. The net effect is that unconfigured ports on many of our VLAN-aware switches are by default on our management network.
(Not all of our switches are VLAN aware; edge switches tend to handle only a single VLAN, all untagged. And not all of the VLAN-aware switches get fed the management VLAN from their upstream.)
What I'd like is that unconfigured ports would be dead. It would be easy for switches to allow this in a natural way: just turn off ports that are not members of any VLAN (untagged or tagged). (This doesn't require any new core functionality, since good switches can already disable ports through a direct interface.)
Short of this, we want to make sure that nothing leaks to or from unconfigured ports. We do this by creating a new do-nothing VLAN, by convention VLAN ID 4094, and switching all ports to be untagged members of it when we set up a new switch.
But this still leaves us with a little gotcha: all ports have to be untagged members of some VLAN, including the switch's uplink port, which will normally see only tagged traffic. Now we have a problem:
- we can't make the uplink port an untagged member of a real VLAN or various undesirable things start happening, ranging from packet duplication and echoing to having to having to remember that one VLAN is special because it is uplinked and downlinked untagged instead of tagged.
- we can't leave the uplink port as a member of the new default VLAN, because that would allow traffic from inactive, unconfigured ports to leak off the switch.
The solution is to make the uplink port an untagged member of a second do-nothing VLAN, by convention VLAN 4093. With no other port in that VLAN, the uplink port should never send any untagged traffic.
(It may receive untagged traffic, depending on how the upstream switch is configured.)
People who are less lazy and more thorough than I am can take this to the next level: make a new do-nothing VLAN for each port (perhaps numbering from 4000 on up, so you can easily keep them straight). This creates complete isolation for unconfigured ports.
(I might do this if our Allied Telesyn switches could be configured
through some scriptable command-line tool. Slogging through the tedium
of their serial console menu interface or their web server thingy to do
this much work is, however, beyond me. In theory I could use expect or
the like to create my own command-line tool, but that would be a lot of
work, necessarily be somewhat fragile, and likely run like warmed-over
maple syrup (while being a lot less appetizing).)
Sidebar: so why not explicitly disable unconfigured ports?
The simple answer is that it's too much make-work, because it means you have to touch two relatively widely separated areas of the switch's menu interface to turn up a port: you have to set its VLAN membership right, and then you have to remember to run off to a completely separate menu to actually turn it on. Forgetting to do one or the other will result in somewhat mysterious failures and a certain amount of annoyance.
(There's lots of reasons why a port might not come up when you plug something into it, such as a bad cable or something wonky on whatever you're using to test with.)
Since things only get plugged into unconfigured ports by accident anyways, I don't care that much and I'd rather avoid potential teeth-grinding mysteries.