Wandering Thoughts archives

2012-05-31

What OSes have succeeded or failed here

In light of my recent entries, people might wonder how we've fared with operating systems here and which specific OSes have succeeded or failed. The necessary disclaimer is that this is all from my personal perspective; my co-workers might have a somewhat different view of things.

OSes that we've actively done things with while I've been here:

  • OpenBSD: clear success within its domain.

    OpenBSD is our automatic choice for anything to do with firewalls (PF just works) and pretty much the default choice for related networking things like VPN servers, routers, and so on. We have a number of OpenBSD machines running additional simple network-related services and they just go and go without problems.

  • Ubuntu LTS: a success but I'm not enchanted with it.

    Ubuntu LTS is our default Linux and thus our default Unix. Its combination of a long 'support' period and a wide vendor-supplied package collection continue to be unmatched (and be what we need), but it has various flaws and rough edges that leave me not really happy with it. Ubuntu LTS is more something that I put up with than something I actually like.

  • Solaris: failed, as discussed.

    (We continue to build new fileservers with Solaris to match the current ones.)

  • RHEL/CentOS: failed in practice.

    The short summary is that RHEL is not enough better than Ubuntu LTS and it has worse package availability (from Red Hat; we don't entirely trust EPEL, since it's a third party source). We've used RHEL on a few servers but the minor improvements (if any) haven't really been worth running another Linux distribution; the servers could as well run Ubuntu LTS and simplify our lives. I keenly regret this because I like RHEL better than Ubuntu LTS, but I have to face reality here; it's not enough better (and it has its own issues even apart from package availability).

    Our iSCSI backends run RHEL 5 and in theory the long support period is a great advantage for this. In practice they're appliances and we never update them anyways, so it's not clear how much this matters. We might stay with RHEL or CentOS in this specific situation even if we were redoing them from scratch today, just in case.

    (We continue to build new iSCSI backends with RHEL 5 to match the current ones.)

  • Windows: TILT. Not applicable in our environment.

    We have one Windows terminal server for a specific, narrow purpose; to give our non-Windows users a way to run Windows programs (especially the Office suite). There's no interest in using Windows servers for anything else; as far as servers go, we're a Unix shop.

Worth mentioning:

  • FreeBSD: never been seriously considered, would likely fail.

    I don't think we've ever seriously looked at running FreeBSD based servers. FreeBSD kind of fall into the 'RHEL zone', where it's different but without any solidly compelling advantage we've thought of. I've used FreeBSD enough in another environment to feel that it's nothing particularly special as Unixes go.

    It's possible that we'll wind up using FreeBSD as our next generation ZFS platform, but even then I don't think it would spread beyond that unless it turns out to be unbelievably cool.

    (In general, unless Linux somehow gets real native ZFS support I expect that any next generation ZFS platform would end up like Solaris is for us today, ie used as a special purpose appliance and no more. The most I'm hoping for is that the next generation platform is more pleasant than our current Solaris one.)

OSes that were pretty much before my time here:

  • Debian: failed due to being supplanted by Ubuntu LTS.

    The problem with Debian in practice is that the support period is too short unless Debian does releases only very slowly. Beyond that, my impression is that Debian wound up being considered an inferior version of Ubuntu LTS; the few Debian based servers we used to have were replaced with ones based on Ubuntu LTS when we focused on the latter.

    (I actually built a Debian-based server here at one point (before we focused on Ubuntu LTS) but it never made it into production.)

  • Red Hat Linux (now Fedora): failed, I believe in part because the support period is too short. Supplanted by Ubuntu LTS.

I don't think we've ever really looked at other Linux distributions; in general it seems unlikely that they have enough of an advantage over Ubuntu LTS to be attractive.

None of the other commercial Unixes are even on the radar (nor any of the other *BSDs; if we're not considering FreeBSD, they're even further down the list). Mac OS X is not something we run on servers, although we have some Macs and Windows machines around as test clients (since our users need our services to work with both).

OSSuccessFailHere written at 16:14:18; Add Comment

2012-05-30

What it means for an OS to succeed or fail

When we say that an OS has succeeded or failed, what we broadly mean is that it did or did not catch on. An OS that succeeds is one that makes sysadmins want to use it more and more, an OS that spreads and grows. Conversely, an OS that fails is one that does not grow; usage either shrinks or stays static. You can look at this both globally and locally, but we should always remember that in the end the global is the aggregate of a whole lot of local decisions.

(Here I consider each different Linux distribution to sort of be a different 'OS'.)

It's important to understand that your use of an OS doesn't have to be a failure in order for it to fail. In fact, to fail is kind of the default state of an OS simply because there are so many of them (and it's crazy to try to use them all). There's a whole spectrum of reasons that OSes fail to catch on in any particular place, ranging from them blowing up on you (an actual failure) through the OS simply not being different enough from something else you're already using (ie, it fails to have a sufficiently compelling reason to use it).

Each organization and group is different, so it's common for an OS to succeed at one place and fail at another. Sometimes this is because your needs and priorities differ, so that something that matters very much to you doesn't matter to someone else and vice versa. Sometimes this is just a matter of which specific OS from a group of sufficiently closely related OSes got a foothold at your site first (for bonus points, what OSes cluster together this way depends on what your needs and priorities are).

(Since it's hard to accept that your preferred alternative won or lost just because of randomness, the latter is a situation that easily leads to vociferous debates among the adherents of the various OSes.)

Locally, we've dabbled in a number of OSes over the years. Very few of them have been actual failures, where the machines we built just didn't work out, but almost all of them have failed in this broader sense in that they have not given us any desire to make further use of the OS. Sometimes we are just indifferent and uninspired by the OS; sometimes we turn out to more or less actively dislike the OS or find it a pain in the rear. Right now I think that only two OSes could really be counted as successes here (in that they are the two OSes we reach for automatically when building new machines), and I don't think we're entirely enthused about one.

(Note that you can still wind up using a failed OS. A number of the OSes that have failed here are (still) running today as vital parts of our production infrastructure. But their use has never spread and we are not too enthused about them. Succeeding and failing here is about changes in usage, not usage itself.)

OSSucceedFail written at 01:20:40; Add Comment

2012-05-24

How we do milter-based spam rejection with Exim

Suppose that you use Exim as your mailer, and you want to do SMTP-time rejection of incoming spam using some outside program that has a sendmail milter protocol. Exim has no native support for the milter protocol, but it is possible to hijack some existing Exim interfaces to more or less achieve this provided that you don't want to try to change the message in flight at this point (only either accept it as is or reject it).

Exim has a content-scanning interface; one of the things it can do is run an external program as an anti-virus scanner (the av_scanner scanner type of cmdline). If you enable things in the acl_smtp_data ACL, the program you run here can signal Exim to reject the message and provide a relatively arbitrary message that Exim can put in the SMTP rejection. Since Exim documents this as an interface for detecting viruses all of the examples talk about things like malware names, but you can use it for anything you want.

A simplified version of our setting for this looks like:

av_scanner = cmdline:/milter/eximdriver.py %s:^REJECT :^REJECT (.+)

Then our DATA ACL contains a stanza with:

deny
  malware = *
  message = Rejected: $malware_name

If eximdriver.py outputs a string that looks like 'REJECT some-reason', Exim will declare that the message contains malware and set $malware_name to the some-reason portion, which we use directly in the SMTP rejection message.

Eximdriver.py has three important pieces. The first is a client side library for the milter protocol, so that it can actually talk to the milter server and get results back. The second is code to load a message from the .eml spool format that Exim writes them to for the AV scanner program; this is basically a standard 'mailbox' format mail message augmented with some special Exim headers. The complication in one's life is that you need to recover the SMTP envelope information from various message headers, including the first Received line.

(You might think that the envelope information could be passed on the command line. Unfortunately not securely. Also, note that the %s in the argument here is not the .eml file itself but the directory it's in. Presumably Exim sets things up this way so that real AV scanners have a per-message directory where they can write whatever temporary files they need.)

The third chunk of eximdriver.py is site-dependent; it interprets the result of the remote milter in order to figure out whether Exim should be told to reject the message and if so, with what reasons. For reasons beyond the scope of this entry, our milter server doesn't give us direct answers on this; instead, it tells us about changes that should be made to the message. Our eximdriver.py reverse engineers these changes back to whether or not the message has a virus and how high a spam score it got and outputs appropriate messages.

As crazy as this may sound, all of this actually works and works fine. It is nowhere near as efficient as direct milter support in Exim would be (we are at least potentially running a Python program for every incoming email message), but our external mail gateway is actually relatively overconfigured for our volume and it's never been a problem for us.

(There is another mitigating factor, which is about to be discussed.)

Our eximdriver.py also does some associated things, like syslog detailed information about rejected messages and some information about accepted ones.

As a side note, it is very deliberate that eximdriver.py must produce output with a specific prefix in order to have email rejected. This is much safer than having any output at all cause message rejection, because it means that if something goes wrong in eximdriver.py (or just with running it, perhaps because of machine overload) we fail safely; we default to accepting the email instead of bouncing it.

(I was going to say that Exim has no easy way for the 'AV scanner' to signal that the mail should be temporarily deferred with a SMTP 4xx, but actually that's wrong. Exim is explicitly documented to default to deferring messages if there's a visible AV scanner problem, and you can use a regular-expression based 'malware = ...' condition in a defer ACL stanza in order to defer based on the milter results as signaled back to Exim in the 'malware name'.)

Sidebar: the gory details

Now I have to confess that this is the simplified version of our configuration, because we have a problem: not all of our users have opted in to SMTP-time rejection of spam messages. In fact, some of our users have opted in to different levels of SMTP-time rejections. This leads to the SMTP DATA problem; because what we say at DATA time applies to all accepted RCPT TO addresses, we have to use the least aggressive level of SMTP-time rejection that everyone has agreed to (possibly right down to 'no SMTP-time rejection'). In turn this means that our Exim configuration has to keep track of this on the fly and eximdriver.py then gets invoked with this level as one of its arguments and only emits REJECT notices if the message qualifies.

Because the current scanning level is a dynamic expansion that must be re-done every time av_scanner is evaluated, our actual av_scanner setting has to look like this:

av_scanner = ${if bool{true} {cmdline:/milter/eximdriver.py -l SCANLVL %s: ^REJECT :^REJECT (.+)}}

The pointless ${if bool{true} ...} portion causes Exim to re-expand this every time it is used so that the current message's scanning level is substituted in.

The simplest way to track the scanning level turned out to be keeping a list of each address's scan level in an ACL variable, $acl_m0_milter. As we handle each address in the RCPT TO ACL, it appends the address's scanning level to the variable (which is initialized to the maximum scan level, currently 3) with an expression like 'set acl_m0_milter = $acl_m0_milter:2' in an otherwise do-nothing warn ACL stanza. The SCANLVL definition reduces this down to the lowest level seen:

SCANLVL = ${reduce {$acl_m0_milter} {3} {${if <{$item}{$value} {$item}{$value}}}}

As an efficiency measure we do not bother doing scanning at all if one or more of the RCPT TO addresses has not opted in to any SMTP-time rejection. This is actually our common case; for various reasons, relatively few people have opted in to any of our server side anti-spam options. This is done with a condition on the 'malware' deny stanza:

condition = ${if >{SCANLVL}{0}{true}{false}}

The result of all of this actually works well, believe it or not.

(Our approach does require that we can put people's anti-spam choices into a linear ordering of some sort. Extensions to schemes involving bitmaps of what people have opted in to are left as an exercise to the dedicated and perverse Exim configuration file writer.)

EximMilterHookup written at 02:30:44; Add Comment

2012-05-09

Using rsync to pull a directory tree to client machines

Suppose that you have a decent sized directory tree that you want some number of clients to mirror from a master server (with the clients pulling updates instead of the master pushing them), perhaps because you've just noticed undesired NFS dependencies. Things in the directory tree are potentially sensitive (so you want access control), it's updated at random, and it's not in a giant VCS tree or something; this is your typical medium-sized ball of local stuff. The straightforward brute force approach is to use rsync with SSH; give the clients special SSH identities, put them in the server's authorized_keys, and have them run 'rsync -a --delete' (or some close variant) to pull the directory tree over. However, this has the problem that normal rsync is symmetric; if you allow a client to pull from you, you also allow a client to push to you (assuming that the server side login has write access to the directory tree, and yes let's make that assumption for now).

(You also have to set the SSH access up so that the clients can't run arbitrary commands on the server.)

Rsync's solution to this is its daemon mode, which can restricted to operate in read only mode. Normally rsync wants to be run this way as an actual daemon (listening on a port and so on), but that requires us to use rsync's weaker and harder to manage authentication, access control, and other things. I would rather continue to run daemon mode rsync over plain SSH and take advantage of all of the existing, proven SSH features for various things.

(The rsync manpage suggests hacks like binding the rsync daemon to only listen on localhost on the server and then using SSH port forwarding to give clients access to it. But those are hacks and require making various assumptions.)

How to to do this is not obvious from the documentation, so here is the setup I have come up with for doing this on both the server and the clients. First, you need an rsyncd.conf configuration file on the server. Don't use the normal /etc/rsyncd.conf; it's much more controllable to use your own in a different place. It should look something like:

use chroot = no
[somepath]
comment = Replication module
path = /some/path
read only = true
# if necessary:
uid = 0
gid = 0

(The '[somepath]' bit is what rsync calls the module name and can be anything meaningful for you; you'll need it on the client later. The comment is optional but potentially useful. You need to explicitly specify uid and gid if the server login is UID 0 for access to the directory tree and you need to keep that; otherwise rsync will drop privileges to a default UID.)

Next, you need a script on the server that will force an incoming SSH login to run rsync in daemon mode against this configuration file and do nothing else. We will set this as the command= value in the server login's authorized_keys to restrict what the incoming SSH connection from clients can do. This looks like:

#!/bin/sh
exec /usr/bin/rsync --server --daemon --config=/your/rsyncd.conf .

Note that this completely ignores any arguments that the client attempts to supply. However, this doesn't matter; as far as I can tell, the command line that the clients send will always be 'rsync --server --daemon .', regardless of what command line options and paths you use on the clients. (Certainly this is the only command line that clients seem to send for requests that you actually want to pay attention to.)

On the server, the login that you're using for this should have a .ssh/authorized_keys file with entries for the client SSH identities. These entries should all force incoming logins to run the command above and block various other activities (especially port forwarding, which could otherwise be done without command execution at all as Dan Astoorian mentioned in a comment here):

command="/your/rsyncd-shell",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty [...]

A from="..." restriction is optional but potentially recommended. Even a broad one may limit the fallout from problems.

Finally, on the client you need to run rsync with all of the necessary arguments. You probably want to put this in a script:

#!/bin/sh
rsync -a --delete --rsh="/usr/bin/ssh -i /client/identity" LOGIN@MASTER-HOST::somepath /some/path/

Potentially useful additional arguments for rsync are -q and --timeout=<something>. In a production script you probably also want an option to mirror the directory tree to somewhere other than /some/path on the client.

If you run this from cron, remember to add some locking to prevent two copies from running at once. If the directory tree is large and you have enough clients, you may want to add some amount of randomization of the start times for the replication in order to keep load down on the master server.

(There may be a better way to do this with rsync; if you know of one, let me know in the comments. For various reasons we're probably not interested in doing this with any other tool, partly because we already have rsync and not the other tools. Another tool would have to be very much better than rsync to really be worth switching to.)

RsyncReplicationSetup written at 23:54:57; Add Comment

Things I will do differently in the next building power shutdown (part 2)

Back at the start of last September, we had an overnight building wide power shutdown in the building with our machine room and I wrote a lessons-learned entry in the aftermath. Well, we just had another one and apparently I didn't learn all of the lessons that I needed to learn the first time around. So here's another set of things that I've now learned.

Next time around I will:

  • explicitly save the previous time's checklist. If nothing else, the 'power up' portion makes a handy guide for what to do if you abruptly lose building power some day.

    (I sort of did this last time, not through active planning but just because I reflexively don't delete basically any of this sort of stuff. But I should do it deliberately and put it somewhere where I can easily find it, instead of just leaving it lying around.)

    Having last time's list isn't the end of the work, because things have undoubtedly changed since then. But it's a starting point and a jog to the memory.

  • start preparing the checklist well in advance, like more than a day beforehand. Things worked out in the end but doing things at the last moment was a bit nerve wracking.

    (There's always stuff to do around here and somehow it always felt like there was plenty of time right up until it was Friday and we had a Monday night shutdown.)

  • update and correct the checklist immediately afterwards to cover things that we missed. My entry from last time is kind of vague; I'm sure I knew the specifics I was thinking of at the time, but I didn't write them down so they slipped away. I was able to reconstruct a few things from notes and email in the wake of last time, but others I only realized in the aftermath of this one.

  • add explanatory notes about why things are being done in a certain order and what the dependencies are. Especially in the bustle of trying to get everything down or up as fast as possible, it's useful to have something to jog our minds about why something is the way it is and whether or not it's that important.

    (Our checklists for this sort of thing are not fixed; they're more guidelines than requirements. We deviate from them on the fly and thus it's really useful to have some indication of how flexible or rigid things are.)

  • if any machines are being brought down and then deliberately not being brought back up, explicitly mention this so that people don't get potentially confused about a 'missing' machine.

My entry from last time was very useful in several ways. I reread it when I was preparing our checklist for this time and it jogged my memory about several important issues; as a result our checklist for this time around was (I think) significantly better than for last time (and also noticeably longer and more verbose). This time I at least made new mistakes, which is progress that I can live with.

I will also probably try to put more explanation into the checklist the next time around. I'm sure it's possible to put too much of it in, but I don't think that's been our problem so far. In the heat of the moment we're going to skim anyways, so the thing to do is to break the checklist up into skimmable blocks with actions and things to check off and then chunks of additional explanation after them.

(In a sense a checklist like this serves two purposes at once. During the power down or power up it is mostly a catalog of actions and ordering, but beforehand it's a discussion and a rationale for what needs to be done and why. Without the logic behind it being written out explicitly, you can't have that discussion; once you have that logic written out, you might as well leave it in to jog people's memories on the spot.)

On a side note, a full power up is an interesting and useful way to find problematic dependencies that have quietly worked their way into your overall network, ones that are not so noticeable when your systems are in their normal steady state. For example, DHCP service for several of our networks now depends on our core fileserver, which means that it can only come up fairly late in the power up process. We're going to be fixing that.

(There is a chain of dependencies that made this make sense in a steady state environment.)

PowerdownLessonsLearnedII written at 00:37:34; Add Comment

2012-05-05

Look for your performance analysis tools now

Last night and this morning we had a significant NFS performance problem on one of our ZFS fileservers, which was a bit stressful. Our fileservers have multiple ZFS pools, each with multiple NFS exported filesystems, and the fileserver was just not responding very well for pretty much any of them. We got as far as determining that one mirror pair of disks for one particular pool were probably saturated (based on iostat figures), but that was a long way from being able to identify who was doing what on what filesystem to do this, or how it was affecting the whole fileserver (and that's if it was even the only problem, instead of the one that was easiest to notice with the tools we had at hand).

Our fileservers are Solaris machines, which means they have DTrace available. People have undoubtedly written DTrace scripts to analyze NFS server activity and performance, to track disk IO to ZFS events, and so on. Which is theoretically wonderful but leads me to a practical observation:

When you need performance analysis tools, it's too late to go find them.

When you're in the middle of a serious issue and need some diagnostic programs, you're in the position of someone who has waited until it's raining to buy roofing tools. At a minimum you're probably not going to do too well at evaluating your options and picking the best, most informative programs and then using them well; instead you're going to grab the first thing that looks like it might help and hammer on your problem some. If this doesn't work, grab the next script and see if it does any better. Repeat until you come up with something or the problem goes away on its own.

This situation may sound crazy but I think it's unfortunately a natural thing to have happen. If you don't currently have performance issues, it doesn't seem very urgent to spend limited time finding and playing around with performance analysis tools; you likely have plenty of higher priority things on your to-do list, things that either have to be done or that have high payoffs. Such low priority playing around is generally seen as a spare time activity (which in practice means it almost never gets done). This is certainly what happened here with me; I always knew that there might be interesting performance analysis tools available for Solaris but it never seemed sufficiently urgent to go investigate them and separate the wheat from the chaff, since I always had more important and engaging things to do.

What my experience today has rubbed my nose into is that this can easily be a false economy. The right thing to measure against isn't what else you could be doing with your time right now, it's how much time you would lose if (or when) you have performance problems and hadn't prepared ahead of time. Now is the right time to work out what tools you have available, how well they work, and how to use them; indeed, it may be the only time you'll have to do it well. Waiting until a crisis is too late. Preparing in advance is the smart thing to do (and it sounds so obvious when I write it like that).

(All of this is nice talk but I have no idea if I will be able to carry through with it, especially since trying to evaluate performance analysis tools without a performance problem is something that I usually find kind of tedious.)

PS: our ZFS fileserver issues turned out to have a somewhat interesting root cause that pretty much went away on its own.

LookForPerfToolsNow written at 01:46:15; Add Comment

By day for May 2012: 5 9 24 30 31; before May; after May.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.