2024-07-06
"Out of band" network management is not trivial
One of the Canadian news items of the time interval is that a summary of the official report on the 2022 Rogers Internet and phone outage has been released (see also the CBC summary of the summary, and the Wikipedia page on the outage). This was an extremely major outage that took down both Internet and phone service for a lot of people for roughly a day and caused a series of failures in services and systems that turned out to rely on Rogers for (enough of) their phone and Internet connectivity. In the wake of the report, some people are (correctly) pointing to Rogers not having any "Out of Band" network management capability as one of the major contributing factors. Some people have gone so far as to suggest that out of band network management is an obvious thing that everyone should have. As it happens I have some opinions on this, and the capsule summary is that out of band network management is non-trivial.
(While the outage 'only' cut off an estimated 12 million people, the total population of Canada is about 40 million people, so it directly affected more than one in four Canadians.)
Obviously, doing out of band network management means that you need a dedicated set of physical hardware for your OOB network; separate switches, routers, local network cabling, and long distance fiber runs between locations (whether that is nearby university buildings or different cities). If you're serious, you probably want your OOB fiber runs to have different physical paths than your regular network fiber, so one backhoe accident can't cut both of them. This separate network infrastructure has to run to everything you want to manage and also to everywhere you want to manage your network from. This is potentially a lot of physical hardware and networking, and as they say it can get worse.
(This out of band network also absolutely has to be secure, because it's a back door to your entire network.)
When you set up OOB network management, you have a choice to make; is your OOB network the only way to manage equipment, or can you manage equipment either 'in-band' through your regular network or through the out of band network. If your OOB network is your only way of managing things, you not only have to build a separate network, you have to make sure it is fully redundant, because otherwise you've created a single point of failure for (some) management. If your OOB network is a backup, you don't necessarily need as much redundancy (although you may want some), but now you need to actively monitor and verify that both access paths work. You also have two access paths to keep secure, instead of just one.
Security or rather access authentication is another complication for out of band management networks. If you need your OOB network, you have to assume that all other networks aren't working, which means that everything your network routers, switches, and so on need to authenticate your access has to be accessible through the OOB management network (possibly in addition to through your regular networks, if you also have in-band management). This may not be trivial to arrange, depending on what sort of authentication system you're using. You also need to make sure that your overall authentication flow can complete using only OOB network information and services (so, for example, your authentication server can't reach out to a third party provider's MFA service to send push notifications to authentication apps on people's phones).
Locally, we have what I would describe as a discount out of band management network. It has a completely separate set of switches, cabling, and building to building fiber runs, and some things have their management interfaces on it. It doesn't have any redundancy, which is acceptable in our particular environment. Unfortunately, because it's a completely isolated network, it can be a bit awkward to use, especially if you want to put a device on it that would appreciate modern conveniences like the ability to send alert emails if something happens (or even send syslog messages to a remote server; currently our central syslog server isn't on this network, although we should probably fix that).
In many cases I think you're better off having redundant and and hardened in-band management, especially with smaller networks. Running an out of band network is effectively having two separate networks to look after instead of just one; if you have limited resources (including time and attention), I think you're further ahead focusing on making a single network solid and redundant rather than splitting your efforts.