Our pragmatic approach to updating machines to match our baseline
A commentator on my entry on our approach to configuration management asked a good question:
the one thing that is problematic, is development of the "gold instalation standatd". when i make some changes, sometimes it's more work to get all the older machines to the new standard state. Do you solve this some way, or the machines are singletons even in the time?
Our answer is that we're pragmatic about this and as a result it depends on why we're changing the baseline installation. First off, changes to the baseline are basically always because of changes to at least some of the actual systems; the real question is thus not whether we update some systems to the new baseline but whether we update all of them to it. The answer to that depends on the change.
Some changes are things that we actively want on all of our systems (or all of the applicable type of system, like login servers) because they're driven by the users requesting things like 'can you add package X to the login servers' or us discovering we need to turn off some new vendor security feature. Obviously these get updated on all of the relevant servers (or at least all of the ones that we care strongly about); the update to the baseline is just to make sure any new or rebuilt servers also get this change. Some changes only really apply to certain sorts of machines but we updated the baseline to do them on every machine because it's easier that way and it does no harm. In this case we don't run around updating the machines the change doesn't really apply to, even though this means that a newly (re)built version of the machine will be different from the current version.
(In theory this is okay because the difference won't create any functional difference.)
One way of summarizing this is that we usually don't bother changing machines if we think that the change won't have any observable effect (in practice, not in theory; if we'll never notice whether or not a package is installed on a machine it qualifies, for example).
Some very basic DNS blocklist hit information for the last 30 days
Our inbound mail gateway anti-spam stuff logs when a connection is from something listed in the CBL or in zen.spamhaus.org (and yes, we know that that's sort of redundant, it's a long story). Because of how it's implemented, we only check zen.spamhaus.org if we don't find the IP in the CBL.
(It turns out that the log message I'm looking at only fires when we
RCPT TO from such an IP address and I think it may fire
multiple times for multiple
RCPT TOs. This makes me think that I need
better logging, although I've already seen that spam filter stats can
Over the last 30 days, we accepted
RCPT TOs from 90,000 different IP
addresses that were in one or the other (some were detected as being in
both at different times). The CBL is the dominant source, at 77,000 or
so; Zen is good for another 15,000 or so. I also have stats for
TOs that we rejected due to the source IP being in one of the DNS
blocklists; over the same 30 day period we rejected 13,500 different
IPs (for a total of 92,000 rejected
RCPT TOs), again almost all from
specifically due to a CBL listing (12,000 to 1,500). Roughly 8,500 of
these IPs also had some
RCPT TOs accepted.
(For scale on the
RCPT TO rejections, over the same time period we
fully accepted somewhere around 540,000
RCPT TOs (counting email that
got all the way to the end of
Generating ad-hoc stats like this makes me think that I should work out what stats are interesting in advance and then make sure that we're logging enough information to reconstruct them. Maybe I should also put together scripts to generate stats automatically on demand (which would mean that I might look at them more).
(The advanced version is having logstash or some equivalent digest all of the logs and provide real-time versions of the stats. But while that might look pretty, it's not really useful; there is nothing actionable in these stats (to use the jargon), just things of vague interest.)