2008-12-21
Part of why managing firewalls is hard
Let me say it up front: managing a firewall of any decent complexity is hard. Sooner or later you start losing track of rules and what's actually going on, writing half-redundant rules, and so on; in short, your firewall ruleset descends into sysadmin superstition. I've recently realized that part of why this happens is that there are three views of your firewall's behavior that you need, and you can't get all of them just from reading your firewall rules; at most you can get one. (Often you don't get any.)
Firewalls are about certain sorts of sources being allowed to do certain sorts of traffic to certain destinations; in the abstract you say 'NFS clients are allowed to do NFS to NFS servers, and no one else is allowed to do NFS to anywhere'. You can want to look at your firewall from the perspective of any of those three things:
- what traffic is allowed to this machine (or group thereof)?
- what can this source of traffic do and reach?
- what are all of the rules for NFS?
(It is tempting to think that you only have one source of traffic, that being the outside world, but I think this is wrong twice over. First, internal machines making outgoing connections are also a source, and you probably have them, and second sooner or later you are going to be treating some outside machines specially.)
You can write a firewall rule system that makes any one of these three the central focus (and you can of course write rule systems that make none of them the central focus, because the rules are expressed at a lower level). But you cannot write a rule system that makes all of them the focus simultaneously, and so you are always going to have to slice up and analyze your firewall rules to get two out of these three views. Even if you can do this, it is quite difficult for people to keep track of all three views at once and synthesize an overall picture from the combination (and thus you get fragile complexity).
This insight makes me vaguely depressed because it means that I can't solve my firewall problems by coming up with the right clever way and the right high level language to specify the firewall rules in. No matter how clever I get, no single thing can give me the overall view of what's going on; it's always going to be hard to get that.
(In my opinion, common OS level firewall rule systems are best viewed as a kind of firewall assembly language (Linux more than OpenBSD); by themselves they are too low level to give you any of these views. You can no more easily understand your firewall by reading PF or iptables rules than you can easily understand anything but a tiny and trivial program by reading its assembly.)
The role of superstition and folklore in system administration
Just like users have folklore, system administration does too. Our sort of superstition is a bit different, though (well, usually): it is the kind of thing where you say 'I don't know why that's there, but let's not remove it just in case'. When our system environments reach a certain level of fragile complexity and we start losing track of the fine details, of course our informed actions start descending into rote procedures.
(This really accelerates when new people come on board; they weren't around to build the systems, so they don't have the picture of how everything fits together in their head, and I think that even good documentation will never really build it.)
Once you lose track of exactly why something is done, the principles of change control come into play. When something works as it is and you're not certain why, you have only two real choices; you can take the time to see if it's still correct and necessary, or you can just keep on doing it until something explodes. It should be no surprise which choice busy sysadmins usually make, and thus you get the superstitions, all of those things that once had a reason (we hope) but we no longer know what it was.
(This growth of superstition shows up in any area of system administration where you can lose track of things, like expensive names.)
Descents into superstition are not fatal, but they are expensive to reverse; you have to actively make the time to reverse engineer how your system really works, and do it thoroughly enough that you're confident that you didn't miss anything. Sometimes you can only successfully get rid of the superstition when you replace the entire system (so you haven't so much fixed it as rendered it irrelevant).
Recognizing when a system is sliding into superstition is important, because it's a serious warning sign both that your system is too complex and that you do not understand it well enough. Continuing with things as they are is likely to result in more and bigger superstitions taking hold, with the attendant loss of real understanding and control of your system.