2010-06-27
How we propagate password information in our fileserver infrastructure
As mentioned earlier, we have a fileserver infrastructure and so we need some way of propagating account and password information around (and letting people actually update their passwords). The old traditional answer is NIS, the new traditional answer is LDAP, and we don't really like either so we wrote our own.
Given the the Unix system UID problem, any such system has three parts: where each machine's account information lives, how global account information propagates around, and how you combine global accounts and system accounts together.
Our answer to the first question is that each machine has a complete
local copy of /etc/passwd, /etc/shadow, and /etc/group. This is
the simple approach, because everything is guaranteed to work with local
files since that is how single, isolated machines work.
(We also feel nervous about adding another point of failure to our fileserver infrastructure in the form of a master account machine that must be up in order for anyone to be able to log in anywhere.)
We've also chosen a simple way to handle propagating the global account
information around; we use our existing fileserver infrastructure. We
have a central administrative filesystem where the global passwd,
shadow, and group files live, and every client machine NFS mounts it
under a standard name. The one complication of the NFS mount approach is
that client machines must have root access to the filesystem in order
to read the global shadow file, which means that we have to be very
careful about which machines we allow to have write access to it.
(The use of an NFS filesystem is really a small implementation detail.
Our Solaris fileservers use the same
system and programs to keep their /etc/passwd in sync, but they don't
NFS mount the administrative filesystem because we don't believe in NFS
crossmounts on the fileservers. Instead they copy the files over with
rsync.)
The update process is somewhat complicated. First, the global passwd
et al is the authoritative source of global accounts, while each
machine's /etc/passwd et al is the authoritative source of its local
system accounts. We tell the two apart based on UID and GID; our global
user logins and groups are always within a specific UID and GID range
(one that is chosen to not clash with local system UIDs and GIDs).
We propagate updates to global accounts by periodically running
an update script that extracts all of the system accounts from
/etc/passwd, extracts all of the global accounts from our master
passwd, merges the two together, and writes out an updated
/etc/passwd et al if anything changed. Because it's convenient, this
also updates the passwords of any system accounts from the global
shadow file if they're present there.
(This avoids having to change the root password on every single system we have, which would be a great disincentive to changing it at all.)
In the process of handling global accounts, the program allows us to both selectively exclude (or only include) some and to selectively or unselectively mangle accounts in various ways. We can change shells (for example to give accounts a shell that just tells them they can't log in to this machine), remap where home directories are in various useful ways, and so on. Also, if there is a conflict between a system login or group name and a global login or group name, it renames the global login or group by sticking a prefix on it.
(Naturally we put the update script itself in the central administrative filesystem too, because that makes maintaining it simpler. The only thing that lives on the client machines is the crontab entry that invokes the whole system every so often.)
Password changes are handled by a cover script for passwd that
ssh's off to our password master machine and runs the real passwd
program there. The global passwd et al are just straight copies of
the password master machine's /etc/passwd et al, although they get
run through a checking program before they get copied from /etc into
the central administrative filesystem. This is an important safeguard
against stupid mistakes when updating the master machine's /etc/passwd
et al.
(The details of how things work on the password master machine are somewhat complicated, so I'm not going to put them here.)
2010-06-26
The Unix system UID and login name problem
Once upon a time, most every Unix system (or at least most every Unix system descended from Berkeley Unix) had a set of system logins and groups that looked more or less identical and had more or less identical UIDs and GIDs. This made it possible for fileserver environments to more or less have a single, global password and group file that was used on all of your machines.
Those days are long over. In fact, things have swung to drastically different system logins and groups, with complete anarchy not just between Unixes or between different distributions of Linux, but even between machines of the same Unix (or Linux distribution) with different package sets installed. The most pernicious problem is that you can wind up with the same login and group on multiple systems but with different UIDs and GIDs assigned for it, and local files owned by those UIDs and GIDs.
(This happens because packages often want to create some system logins and groups when installed and they don't necessarily have a fixed, preassigned UID and GID for their stuff; instead they just ask for the next free system UID or GID. The result is that the UID they get varies depending on what else the system has installed and even the order that the packages were installed in.)
The net result is that it is now somewhere between very difficult
and completely impossible to have a completely common and global
/etc/passwd and /etc/group in your fileserver environment, even if
you are running the same version of the same Unix (or Linux) on all
of your machines. Instead, you really need to design your password
propagation system around the assumption that you will have both
per-machine local accounts and environment-wide global accounts.
2010-06-24
The elements of fileserver infrastructure
As I sort of mentioned in an aside in the last entry, pretty much any sort of fileserver
infrastructure does a number of other things besides just serving
files. In turn this means that hooking a machine into your fileserver
environment generally involves a lot more than just some mount
commands.
For the purposes of this entry, let's ignore all of the services that only care about the fileservers themselves; backup, monitoring, and so on. We'll only look at the services that a client has to be tied into. My experience is that there are three major components:
- the actual fileservice itself, generally NFS on Unix machines.
By now this generally just works.
- some way of propagating around information about what filesystems to
mount from where and of getting them mounted when necessary. Some
form of automounter is the traditional Unix answer, although we
built our own (as have other people).
Even with an automounter you still need some mechanism to propagate
the automounter maps around.
- some way to propagate account and password information around to all of the clients and to let users change their password once instead of separately on every machine. This gets you into semi-policy areas such as dealing with systems that have different ideas of what system accounts exist and what UIDs they have.
At one point NIS was the usual way to propagate both account and automounter map information around, and there were basically turnkey solutions to handle it all; you could just follow the vendor's manual when configuring your Unix machines and be done with it. I'm out of touch with the modern usual way is, but I believe it's LDAP for distributing account information and I don't know what people do for automounter maps. My impression is that an LDAP setup is a lot less turnkey than the old NIS approach.
(Locally we have never been very fond of NIS, so we have always rolled our own distribution and account management systems. And as I mentioned, we've replaced automounter entirely.)
In theory you don't need to have an infrastructure at all, because you can just do all of these things by hand. This approach is not recommended and trying it out generally results in you building an infrastructure after all. And post-facto infrastructures are generally more painful than an infrastructure that you think about while you're setting up the whole environment.
2010-06-23
The advantages of separate machines for separate things
Sometimes it seems that system administration goes in cycles. Right now the cycle is moving back towards consolidation of services on fewer machines, so I want to talk about the advantages of using separate machines (whether virtual or physical) for different services, instead of putting them all on the same machine with various degrees of clever tricks.
(The genesis of this entry was a comment on this entry, talking about how one could use one machine instead of two to do the job I was tackling.)
First off, it is usually simpler to configure the machines. This especially so if you need two instances of the same system, such as a mailer or a web server, as many system setups are simply not designed with this in mind and require a bunch of changes to work right (and missing a change can be problematic). Running only one system instance of a service is the common case that everything is designed for, so you might as well go with the flow; it's easier.
Second, it gives you isolation and independence; when you do things to the underlying system environment, it affects as few services as possible. The obvious case is taking a machine down or rebooting it, where if you have a bunch of services on the machine you need a time that is acceptable to have all of them down (at once). Similarly, if you're planning an OS upgrade or change you need to have all of the services ready to go on the new OS instead of just one of them (and you can't do a split upgrade, keeping some services on the old OS version and moving others to the new one).
This also implies that your one machine needs to be configured for
the union of what all of its services require. This sounds abstract,
so I'll give a concrete example: do you need to mount all of your NFS
filesystems? Our main mail machine
has to have user home directories mounted from our fileservers and
use our full /etc/passwd file, but the spam forwarding machine does not. As a result, the spam
forwarding machine is almost entirely decoupled from our fileserver
infrastructure.
(Our fileserver infrastructure does much more than just plain NFS service, but that's another entry.)
And of course you get fault isolation; if something goes wrong on one machine, it only takes down one service instead of a whole bunch of them. Here, things going wrong can be anything from system crashes to CPU fan failures to someone accidentally nudging the network cable out when they were in the machine room doing something else.
Sometimes the services really are so tightly coupled that you would never use the freedom that one service per machine isolation gives you. But in my experience this is the rare case; far more common is situations where some services are easier to interrupt than others.
Virtualization is not a cure-all for these issues; if anything it can make some of them worse, because it can concentrate a lot of machines and services onto one physical piece of hardware and one host OS. You can do better, but it gets expensive.
(From personal experience, doing anything to the host machine for a bunch of virtual machines is a pain in the rear if you don't have easy failover.)
2010-06-21
Applying low distraction design to alerting systems
Writing yesterday's entry has left me with some thoughts on creating low-distraction alerting and monitoring systems. Obviously this should only include informative monitoring, but once you've got that you still need to present the information on what alerts are active in a good way. And because you want sysadmins to check your alerts page relatively frequently, you want it to be low distraction in the same way that email checks should be.
A low distraction system needs to show you enough information for you to make at least a preliminary decision, present events in some useful order, and let you shut it up. So, what I think you want is:
- a display that is organized by severity of alarm and reverse
chronological order within that, with the most recent alarm on
top and thus the most visible, with either the ages or the start
times shown.
- some sort of one-line summary of each alarm's specific details,
so that you don't have to drill down further to find out what the
actual problem is.
- a way of hiding or dismissing specific alarms. Probably you should have a way of canceling this and re-revealing all current alarms.
For an added bonus, default to aggregating alarms together in some way if they are chronologically close enough (with an option to expand out the full details). This provides a natural way to condense cascade failures down into a single alert, crudely solving the alerting dependency problem.
Intuitively, I think that by priority and then chronologically is the right order to sort events into. In most situations I care more about a recent issue than an older one (after all, if the systems haven't entirely melted down by now the older issue can probably wait a bit longer), and more about an older high priority issue than a newer lower priority one. This is arguable and may depend on local circumstances.
(And the priorities may involve things like 'what machine is this reported on', with some machines being much more important than others.)
2010-06-16
The problem of testing firewall rules changes
In an earlier entry, I mentioned that firewalls are a classical case of difficult testing where differences between your test and your production environments can be vitally important. Let's elaborate on that.
Suppose that you have some firewall rules changes that you want to make. As a good developer-style sysadmin, you are not going to just dump them on your production firewall; instead you have a test firewall that you push rules to first for testing. But here's the question: how is your test firewall's networking configured, specifically, do you give it test IPs and networks, or do you configure it exactly identically to the production firewall, using the production firewall's IPs and networks?
If you give it production IPs and networks, it obviously has to be completely isolated from your production environment. In turn this means that it needs to have its own supporting (and testing) network infrastructure (with multiple machines, network connections, etc), and you have to somehow push configuration updates into that test network infrastructure.
(I'm going to assume that our only concern is testing firewall rules changes; we're going to assume that things like firewall monitoring systems continue to work fine, so we don't have to build something to test them inside this isolated environment.)
If your test firewall uses test IPs and networks, it doesn't have to be completely isolated from your production environment and can reuse a bunch of your existing update and management infrastructure. This sounds good, but there's a problem: errors in IP addresses and network blocks are exactly one of the problems with firewall changes, yet you can't test for these errors if your test firewall uses test IPs and network blocks. Your test version of the change, using test IPs, can be done right, yet you've made a mistake when writing out the production IPs; you'll only find out when you push the update to the production firewall and things start breaking.
(So what differences between your test and production environments are acceptable to have? My only thought right now is that differences in things that you don't change seem safe, because then you can verify all of those differences once and know that things are good from then onwards.)
2010-06-07
One problem with testing system changes
One of the strange things about system administration as compared to development is the general lack of testing that sysadmins do. I believe that one reason for this is that sysadmins have a hard time testing changes, especially on a budget.
Now, I will admit that I have a biased viewpoint on this; I work in a relatively complex environment (although one that's fairly small by the standards of large systems). As is common in multi-machine environments, we effectively have hierarchies of machines and systems, with a small number of core machines and then more and more machines as you move outwards.
In order to do system-level testing, you need test machines. More than test machines, you need a test environment, something where your changes can be isolated from your production environment. Testing changes at the periphery of our hierarchies is generally easy, because nothing depends on peripheral machines (or services) and thus changes only affect them and only have to be tested on them; you can easily set up a test machine, make a change just on it, and see if it works.
(Well, in theory. In practice even peripheral machines can be quite complex in their own right, offering what is in effect many services.)
But the more interesting and dangerous changes are usually nearer the center and thus have downstream effects on the systems 'below' them. In order to thoroughly test these changes, you need not just a test machine that duplicates your production machine, you need a duplicate of the downstream environment too. The more central the service you're testing a change to, the more infrastructure you need to duplicate even if you miniaturize it (with fewer machines than in your production environment).
(By the way, I'm not convinced that virtualization answers all of the problems here. Hardware differences do affect how systems behave, and virtualized hardware is different from real hardware (even once we set aside speed and load testing issues).)
In the extreme, fully testing changes before deploying them requires a somewhat miniaturized but faithful test version of your entire infrastructure, in order to make the test environment good enough that you will really catch problems before you go live. This is, as a minimum, a pain.
(There is also a tension due to the fact that for sysadmins, every difference between the production environment and the test environment is a chance for uncaught errors to creep in, yet too much similarity between them (even on peripheral machines) can complicate attempts to share elements of the overall infrastructure. The classical case of this is testing firewall changes.)
(This is a very slow reaction to On revision control workflows, which was itself a reaction to an entry of mine.)