Our three generations of network implementations (over the time I've been here)

May 23, 2021

Over the time I've been here, we've had pretty much the same network structure but we've gone through three different major physical implementations of it. Between the three of them, we've pretty much covered the full spectrum of sensible ways to implement a multi-network environment.

When I arrived, we were at the tail end of the first implementation. This was built around a small number of big chassis switches from Foundry, the kind that take various sorts of 'blades', also known as cards (at the time, most of these were for 100 MBit ports). Every Foundry carried all of our networks as separate VLANs, and almost all of our drops were connected straight into the Foundry in the nearest building wiring closet or machine room. To 'rewire' a drop in a room to a different internal network, we changed the port's configuration on the relevant Foundry. The Foundries connected to each other over a set of trunk network connections (carried over fiber).

The second implementation was an attempt to duplicate this setup using individual 24-port switches instead of new big chassis switches (because we couldn't afford them). Essentially we treated each 24-port switch as if it was a Foundry blade. Each such switch carried all of our VLANs and it was configured on a port by port basis, just as the Foundries had been. If you wanted to change what network a drop was on, you could either switch it to a free port that was already on that network (if there was one somewhere) or you could reconfigure the switch to change the port's VLAN. Unfortunately this implementation turned out not to work too well. Making switch configuration changes isn't as easy as it looks and we soon got fed up with various problems that came up.

Our third implementation is the one I wrote up in 2011 and which we still use today with various pieces upgraded. We have a collection of core switches (now 10G) that carry all of our networks as separate VLANs, and then we hang single-VLAN switches off ports on the core switches (usually in a tree if we have more than 20 odd connections for a particular network in a machine room or building wiring closet). We change what network a particular drop is on by physically rewiring it to a different switch. The only time a core switch's configuration changes is if we need to add or remove a port for an entire network, which is pretty rare. This approach has worked out better for us than the previous two ones.

(This neat description is not quite complete. There are a few machines that have to be fed multiple VLANs on a single port, such as our internal firewalls, and there's a single breakout switch for our office drops that also carries multiple VLANs.)

I don't feel too bad about us doing the second implementation between the first and the third. At the time we started into it, using modern switches as replacements for Foundry blades seemed the obvious approach and we didn't yet have the experience to understand all of the annoyances and problems we were going to wind up running into. In practice it turned out the scale mattered (changing one Foundry on a regular basis is a lot easier than changing one of a half dozen or more 24-port switches), and quite possibly our Foundry setup was not ideal in the first place.

PS: We have some separate switches that use VLANs for brute force port isolation. They don't complicate this network implementation, partly because they've all got a fixed configuration and never plug directly into our core switches.

PPS: The one obvious way we haven't implemented a multi-network environment is to run each network over a completely separate switch tree and set of links between machine rooms and buildings. At a minimum, this would need far more fiber links than we have available.


Comments on this page:

By Andrew at 2021-05-23 11:37:13:

It's been a while since I had to deal with this kind of thing, but at the last place where I did, we had reasonable success with a setup with .1q trunking, where most of the switches consulted a RADIUS thing which looked up the device MAC address to decide whether to allow it and what VLAN to put it on. A few ports were dedicated to things like printers and IP phones, and they were connected to switches that would allow anything but were hard-configured to the printer or IP-phone VLAN with very limited access.

The only real trouble was "black-starting" -- if everything was down you had to manually edit some switchports to bring up the needed set of servers to enable the switches to let everything else connect :)

Written on 23 May 2021.
« I don't know how much memory our Prometheus setup needs
Why we don't have management connections to our switches (an old story) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun May 23 00:20:18 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.