2016-05-22
Our problem with OmniOS upgrades: we'll probably never do any more
Our current fileserver infrastructure is currently running OmniOS r151014, and I have recently crystallized the realization that we will probably not upgrade it to a newer version of OmniOS over the remaining lifetime of this generation of server hardware (which I optimistically project to be another two to three years). This is kind of a problem for a number of reasons (and yes, beyond the obvious), but my pessimistic view right now is that it's an essentially intractable one for us.
The core issue with upgrades for us is that in practice they are extremely risky. Our fileservers are a core and highly visible service in our environment; downtime or problems on even a single production fileserver directly impacts the ability of people here to get their work done. And we can't even come close to completely testing a new fileserver outside of production. Over and over, we have only found problems (sometimes serious ones) under our real and highly unpredictable production load.
(We can do plenty of fileserver testing outside of production and we do, but testing can't show that production fileservers will be problem free, it can only find (some) problems before production.)
Since upgrades are risky, we need fairly strong reasons to do them. When our existing fileservers are working reasonably well, it's not clear where such strong reasons would come from (barring a few freak events, like a major ixgbe improvement, or the discovery of catastrophic bugs in ZFS or NFS service or the like). On the one hand this is a testimony to OmniOS's current usefulness, but on the other hand, well.
I don't have any answers to this. There probably really aren't any, and I'm wishing for a magic solution to my problems. Sometimes that's just how it goes.
(I'm assuming for the moment that we could do OmniOS version upgrades through new boot environments. We might not be able to, for various reasons (we couldn't last time), in which case the upgrade problem gets worse. Actual system reinstalls, hardware swaps, or other long-downtime operations crank the difficulty of selling upgrades up even more. Our round of upgrades to OmniOS r151014 took about six months from the first server to the last server, for a whole collection of reasons including not wanting to do all servers at once in case of problems.)
My view of Barracuda's public DNSBL
In a comment on this entry, David asked, in part:
Have you tried the Barracuda and Hostkarma DNSBLs? [...]
I hadn't heard of Hostkarma before, so I don't have anything to say about it. But I am somewhat familiar with Barracuda's public DNSBL and based on my experiences I'm not likely to use it any time soon. As for why, well, David goes on to mention:
[...] Barracuda in particular lists more aggressively and is willing to punish lower volume relays that fail to mitigate spammer exploitations. [...]
That's one way to describe what Barracuda does. Another way to put it is that in my experience, Barracuda is pretty quick to list any IP address that has even a relatively brief burst of outgoing spam, regardless of the long term spam-to-ham ratio of that IP address. Or to put it another way, whenever we have one of our rare outgoing spam incidents, we can count on the outgoing IP involved to get listed and for some amount of our entirely legitimate email to start bouncing as a result.
As a result I expect that any attempt to use it in our anti-spam system would have far too high a false positive rate to be acceptable to our users. Given this I haven't attempted any sort of actual analysis of comparing sender IPs of accepted and rejected email against the Barracuda list; it's too much work for too little return.
My suspicion is that this is likely to be strongly influenced by your overall rate of ham to spam, for standard mathematical reasons. If most of your incoming email is spam anyways and you don't often receive email from places that are likely to be compromised from time to time by spammers, its misfires are not likely to matter to you. This does not describe our mail environment, however, either in ham/spam levels or in the type of sources we see.
(To put it one way, universities are reasonably likely to get one of their email systems compromised from time to time and we certainly get plenty of legitimate email from universities.)
On my personal sinkhole spamtrap, I could probably use the Barracuda list (and the psky RBL) as a decent way of getting rid of known and thus probably uninteresting source of spam in favour of only having to deal with (more) interesting ones. But obviously this spamtrap gets only spam, so false positives are not exactly a concern. Certainly a significant number of recently trapped messages there are from IPs that are on one or the other lists (and sometimes both), although obviously I'm taking a post-facto look at the hit rate.