Our network layout (as of May 2011)

May 13, 2011

Today, I feel like talking about how our networks are organized at the logical level of subnets and routing and interconnections (instead of the physical level of switches and so on; that's another entry). This will be simplified somewhat because going into full detail would make it feel too much like work.

We have a single connection to the campus backbone; these days it uses a touchdown network that connects to our core router (although it didn't used to, which caused some issues). Hanging off of the core router is all of our public subnets.

Now, we have nowhere near enough public IP addresses to go around, and especially we don't have enough subnets to isolate research groups from each other. So basically the only thing on our public subnets is centrally managed servers and infrastructure; actual people and research groups use subnets in private IP address spaces that we call 'sandboxes'. Most sandboxes are routed sandboxes; they sit behind routing NAT gateways (although sandbox machines are not NAT'd when they talk to internal machines), are routed internally, and can reach both internal machines and the outside world (and with the right configuration, the outside world can reach them if desired). Generally each research group gets its own sandbox (which they have a lot of control over), and we have a few generic sandboxes as well.

(We provide a few services for research group sandboxes; we host the DNS data, provide DHCP if they want it, and of course manage the NAT gateways. DNS-wise, sandbox machines have names that look like sadat.core.sandbox; we have a complex split horizon DNS setup, and of course .sandbox exists only in the internal view.)

The most important generic sandbox is what we sometimes call our 'laptop network', which is for general user-managed machines such as Windows machines. Unlike regular sandboxes, this sandbox is port isolated so that user machines can't infect each other. Because people sometimes want to run servers on their own machines and have them reachable by other people on the laptop network, we have a second port isolated subnet for 'servers' (broadly defined). We also have a port isolated wireless network that sits behind a separate NAT gateway. Unlike conventional sandboxes, the wireless network is NAT'd even for internal machines.

(There is also a PPTP VPN server, which needs yet another chunk of private address space for the tunnels it creates with clients.)

These NAT gateways sit on our normal public subnets, or I should actually say 'subnet'; we have been slowly relocating our servers so that almost everything lives on a single subnet. Among other advantages, this means that we avoid round trips through our core router when servers talk to each other or to sandbox machines. Since some research groups have compute clusters in their sandbox that NFS mount filesystems from our fileservers and care about their performance, we like avoiding extra hops when possible. Our core router has static routes configured for all of the routable sandbox subnets, and various important servers on the subnet also have them for efficiency.

(Well, okay, basically all of the servers on the subnet have the static routes because we automated almost all of the route setup stuff, and once you do that it's easy enough to make it happen everywhere.)

There is a general campus-wide requirement that networks at least try to block anonymous access. For the wireless network, we deal with this by requiring authentication on the wireless gateway (or that you use our VPN, which does its own authentication); for the laptop network, we have a couple of self-serve DHCP registration systems with a tangled history. For research group sandboxes, we leave it up to the research group and anyways, most research group sandboxes can only be used from physical areas with good access control like the group's lab space.

We (the central people) use a number of sandboxes ourselves for various reasons. Some of them are routed, but some are not for various reasons (for example, we consider it a feature that the iSCSI interconnect subnets are not reachable even by internal machines). There's a few sandboxes that we don't route for especially interesting reasons, but that's another entry.

Sidebar: on some names

Our laptop network is conventionally called the red network or just 'the red', and the collective public subnets that we have our servers on are called the blue network. These names come from the colours of network cables used for each subnet, especially in areas that normal people see. Being strict on cable colour allows us to tell people things like 'only plug your laptop into a red cable, but you can plug it into any red cable you find and it should work'.

(Partly because there aren't enough commonly available cable colours to go around, sandbox network cables are generally blue and blue has thus become our generic cable colour in public areas. A few groups have different, specific cable colours for their sandboxes for historical reasons. The rules are different and much more complicated in our machine room and in wiring closets.)


Comments on this page:

From 70.26.88.153 at 2011-05-13 11:02:17:

It takes a bit of work to set up initially, but there are a bunch of places that use "dynamic VLANs":

http://en.wikipedia.org/wiki/Virtual_LAN#Establishing_VLAN_memberships
http://www.google.ca/search?q=dynamic+vlan

Basically when a machine is plugged in, the switch takes its MAC address and queries a RADIUS server (or VLAN Management Policy Server if you want to get fancy). The query then sends back the VLAN to which the machine is registered to use.

If the machine is not registered in the database, then a default VLAN can be assigned that is completely isolated from the rest of the network. There was a USENIX LISA paper on it a while back, but it's become more mainstream:

A key feature of the autoMAC system is that a user can plug their host into any public ethernet jack attached to one of our user switches, and it will be automatically connected to the correct VLAN. This is accomplished using a RADIUS [6] server that ‘‘authenticates’’ the MAC address of the host attempting to connect to the network.

http://www.lib.unb.ca/engineering/usenix/lisa04/tech/tengi.html

By cks at 2011-05-13 11:20:33:

That's a neat system, but sadly I expect that it's totally out of our budget range. Like a lot of other things, the network here runs on more or less a shoestring.

(We also have some machines that move between sandboxes, although that might go away if all sandboxes were pervasively available; the usual case is a laptop that can be on either the laptop network or a research group's sandbox.)

From 70.26.88.153 at 2011-05-13 11:42:09:

It shouldn't actually cost too much (theoretically, at least outside of time).

Most recent "enterprise" switches already have the functionality already built-in, and the back-end is generally FreeRADIUS and perhaps some kind of database that is queried (SQL, OpenLDAP, etc.). The paper mentioned does it all with open-source software (like most LISA stuff).

For multi-VLAN machines: you don't have to have all ports dynamic. Certain rooms (labs, server) can be hard-coded to only be on certain VLANs, while others can be dynamic. This can be even be done with wireless APs according to Cisco: although you have one SSID, an account's RADIUS attributes would determine which VLAN it actually gets dropped in:

http://tinyurl.com/2oxg32
http://www.cisco.com/en/US/tech/tk722/tk809/technologies_configuration_example09186a008076317c.shtml

Again, while you can buy commercial software to do this, IEEE and IETF standards run things, so you can get software running on Unix-y systems to accomplish this.

By cks at 2011-05-13 12:01:26:

Our switches aren't necessarily recent and (how can I put this) aren't anywhere near 'enterprise'. They work fine, but almost all of them are only basic managed/VLAN-capable switches.

(This would also require us to re-design our physical network topology, but that's actually another entry.)

Written on 13 May 2011.
« How ZFS lets you recover from damaged metadata, and what the limitations are
Our environment illustrated: what network cable colours mean what »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri May 13 02:11:23 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.