2015-02-28
Sometimes why we have singleton machines is that failover is hard
One of our single points of failure around here is that we have any number of singleton machines that provide important service, for example DHCP for some of our most important user networks. We build such machines with amenities like mirrored system disks and we can put together a new instance in an hour or so (most of which just goes to copying things to the local disk), but that still means some amount of downtime in the event of a total failure. So why don't we build redundant systems for these things?
One reason is that there's a lot of services where failover and what I'll call 'cohabitation' is not easy. On the really easy side is something like caching DNS servers; it's easy to have two on the network at once and most clients can be configured to talk to both of them. If the first one goes down there will be some amount of inconvenience, but most everyone will wind up talking to the second one without anyone having to do anything. On the difficult side is something like a DHCP server with continually updated DHCP registration. You can't really have two active DHCP servers on the network at once, plus the backup one needs to be continually updated from the master. Switching from one DHCP server to the other requires doing something active, either by hand or through automation (and automation has hazards, like accidental or incomplete failover).
(In the specific case of DHCP you can make this easier with more
automation, but then you have custom automation. Other services,
like IMAP, are much less tractable for various reasons, although
in some ways they're very easy if you're willing to tell users 'in
an emergency change the IMAP server name to imap2.cs'.)
Of course this is kind of an excuse. Having a prebuilt second server for many of these things would speed up bringing the service back if the worst came to the worst, even if it took manual intervention. But it's a tradeoff here; prebuilding second servers would require more servers and at least partially complicate how we administer things. It's simpler if we don't wrestle with this and so far our servers have been reliable enough that I can't remember any failures.
(This reliability is important. Building a second server is in a sense a gamble; you're investing up-front effort in the hopes that it will pay off in the future. If there is no payoff because you never need the second server, your effort turns into pure overhead and you may wind up feeling stupid.)
Another part of this is that I think we simply haven't considered building second servers for most of these roles; we've never sat down to consider the pros and cons, to evaluate how many extra servers it would take, to figure out how critical some of these pieces of infrastructure really are, and so on. Some of our passive decisions here were undoubtedly formed at a time when how our networks were used looked different than it does now.
(Eg, it used to be the case that many fewer people brought in their own devices than today; the natural result of this is that a working 'laptop' network is now much more important than before. Similar things probably apply to our wireless network infrastructure, although somewhat less so since users have alternatives in an emergency (such as the campus-wide wireless network).)
2015-02-27
What limits how fast we can install machines
Every so often I read about people talking about how fast they can get new machines installed and operational, generally in the context of how some system management framework or another accelerates the whole process. This has always kind of amused me, not because our install process is particularly fast but instead because of why it's not so fast:
The limit on how fast we install machines is how fast they can unpack packages to the local disk.
That's what takes almost all of the time; fetching (from a local mirror or the install media) and then unpacking a variegated pile of Ubuntu packages. A good part of this is the media speed of the install media, some of this is write speed to the system's disks, and some of this is all of the fiddling around that dpkg does in the process of installing packages, running postinstall scripts, and so on. The same thing is true of installing CentOS machines, OmniOS machines, and so on; almost all of the time is in the system installer and packaging system. What framework we wrap around this doesn't matter because we spend almost no time in said framework or doing things by hand.
The immediate corollary to this is that the only way to make any of our installs go much faster would be to do less work, ranging from installing fewer packages to drastic approaches where we reduce our 'package installs' towards 'unpack a tarball' (which would minimize package manager overhead). There are probably ways to do approach this, but again they have relatively little to do with what system install framework we use.
(I think part of the slowness is simply package manager overhead instead of raw disk IO speed limits. But this is inescapable unless we somehow jettison the package manager entirely.)
Sidebar: an illustration of how media speeds matter
Over time I've observed that both installs in my testing virtual machines and installs using the virtual DVDs provided by many KVM over IP management processors are clearly faster than installs done from real physical DVDs plugged into the machine. I've always assumed that this is because reading a DVD image from my local disk is faster than doing it from a real DVD drive (even including any KVM over IP virtual device network overhead).
2015-02-24
How we do and document machine builds
I've written before about our general Ubuntu install system and I've mentioned before that we have documented build procedures but we don't really automate them. But I've never discussed how we do reproducible builds and so on. Basically we do them by hand, but we do them systematically.
Our Ubuntu login and compute servers are essentially entirely built through our standard install system. For everything else, the first step is a base install with the same system. As part of this base install we make some initial choices, like what sort of NFS mounts this machine will have (all of them, only our central administrative filesystem, etc).
After the base install we have a set of documented additional steps; almost all of these steps are either installing additional packages or copying configuration files from that central filesystem. We try to make these steps basically cut and paste, often with the literal commands to run interlaced with an explanation of what they do. An example is:
* install our Dovecot config files:
cd /etc/dovecot/conf.d/
rsync -a /cs/site/machines/aviary/etc/dovecot/conf.d/*.conf .
Typically we do all of this over a SSH connection, so we are literally cutting and pasting from the setup documentation to the machine.
(In theory we have a system for automatically installing additional Ubuntu packages only on specific systems. In practice there are all sorts of reasons that this has wound up relatively disused; for example it's tied to the hostname of what's being installed and we often install new versions of a machine under a different hostname. Since machines rarely have that many additional packages installed, we've moved away from preconfigured packages in favour of explicitly saying 'install these packages'.)
We aren't neurotic about doing everything with cut and paste; sometimes it's easier to describe an edit to do to a configuration file in prose rather than to try to write commands to do it automatically (especially since those are usually not simple). There can also be steps like 'recover the DHCP files from backups or copy them from the machine you're migrating from', which require a bit of hand attention and decisions based on the specific situation you're in.
(This setup documentation is also a good place to discuss general issues with the machine, even if it's not strictly build instructions.)
When we build non-Ubuntu machines the build instructions usually follow a very similar form: we start with 'do a standard base install of <OS>' and then we document the specific customizations for the machine or type of machine; this is what we do for our OpenBSD firewalls and our CentOS based iSCSI backends. Setup of our OmniOS fileservers is sufficiently complicated and picky that a bunch of it is delegated to a couple of scripts. There's still a fair number of by-hand commands, though.
In theory we could turn any continuous run of cut and paste commands into a shell script; for most machines this would probably cover at least 90% of the install. Despite what I've written in the past, doing so would have various modest advantages; for example, it would make sure that we would never skip a step by accident. I don't have a simple reason for why we don't do it except 'it's never seemed like that much of an issue', given that we build and rebuild this sort of machine very infrequently (generally we build them once every Ubuntu version or every other Ubuntu version, as our servers generally haven't failed).
(I think part of the issue is that it would be a lot of work to get a completely hands-off install for a number of machines, per my old entry on this. Many machines have one or two little bits that aren't just running cut & paste commands, which means that a simple script can't cover all of the install.)
2015-02-14
Planning ahead in documentation: kind of a war story
I'll start with my tweet:
Current status: changing documentation to leave notes for myself that we'll need in three or four years. Yes, this is planning way ahead.
What happened is that we just upgraded our internal self-serve DHCP portal from Ubuntu 10.04 LTS to Ubuntu 14.04 LTS because 10.04 is about to go out of support. After putting the new machine into production last night, we discovered that we'd forgotten one part of how the whole system was supposed to work and so that bit of it didn't work on the new server. Specifically, the part we'd forgotten involved another machine that needed to talk to our DHCP portal; the DHCP portal hostname had changed, and the DHCP portal system wasn't set up to accept requests from the other machine. That we'd forgotten this detail wasn't too surprising, given that the last time we really thought much about the whole collection of systems was probably four years or so ago when we updated it to Ubuntu 10.04.
So what I spent part of today doing was adding commentary to our build instructions that will hopefully serve as a reminder that parts of the overall DHCP portal extend off the machine itself. I also added some commentary about gotchas I'd hit while building and testing the new machine, and some more stuff about how to test the next version. I put all of this into the build instructions because the build instructions are the one single piece of documentation that we're guaranteed to read when we're building the next version.
As it happens, I can make a pretty good prediction of when the next version will be built: somewhat before when Ubuntu 14.04 stops being supported. On Ubuntu's current schedule that will be about a year after Ubuntu 18.04 LTS comes out, ie four years from now (but this time around we might rebuild the machine sooner than 'in a race with the end of support').
Preparing documentation notes for four years in the future may seem optimistic, but this time around it seemed reasonably prudent given our recent experiences. At the least it could avoid future me feeling irritated with my past self for not doing so.
(I'm aware that in some places either systems would hardly last four years without drastic changes or at the least people would move on so it wouldn't really be your problem. Neither are true here, and especially our infrastructure is surprisingly stable.)
2015-02-09
'Inbox zero' doesn't seem to work for me but it's still tempting
Every so often I read another paen to the 'inbox zero' idea and get tempted to try to do it myself. Then I come to my senses, because what I've found over time is that the 'inbox zero' idea simply doesn't work for me because it doesn't match how I use email.
I do maintain 'inbox zero' in one sense; I basically don't allow unread email to exist. If it's in my actual MH inbox, I've either read it, am in the process of reading it, or I've been distracted by something being on fire. But apart from that my inbox becomes one part short term to-do tracker, one part 'I'm going to reply to this sometime soon', and one part 'this is an ongoing issue' (and there's other, less common parts).
What I do try to do is keep the size of my inbox down; at the moment my goal is 'inbox under 100', although I'm a bit short of achieving that (as I write this my inbox has 105 messages). Some messages naturally fall out as I deal with them or their issue resolves itself; other messages start quietly rotting until I go in to delete them or otherwise dump them somewhere else. Usually messages start rotting once they aren't near the top of my inbox, because then they scroll out of visibility. I try to go through my entire inbox every so often to spot such messages.
What it would take to get me to inbox zero is ultimately not a system but discipline. I need most or all of the things that linger in my inbox, so if they're not in my inbox they need to be somewhere else and I need to check and maintain that somewhere else just as I check and maintain my inbox. So far I've simply not been successful at the discipline necessary to do that; when I take a stab at it, I generally backslide under pressure and then the 'other places' that I established this time around start rotting (and I may forget where they are).
On the other hand, I'm not convinced that inbox zero would be useful for me as opposed to make-work. To the extent that I can see things that would improve my ability to deal with email and not have things get lost, 'inbox zero' seems like a clumsy indirect way to achieve them. More useful would be something like status tags so that I could easily tag and see, say, my 'needs a reply' email. You can do such status tagging via separate folders, but that's kind of a hack from one perspective.
(I'd also love to get better searching of my mail. Of course none of this is going to happen while I insist on clinging grimly to my current mail tools. But on the other hand my current tools work pretty well and efficiently for me and I haven't seen anything that's really as attractive and productive as they are.)
(A couple of years ago I wrote about how I use email, which touches on this from a somewhat different angle. This entry I'm writing partly to convince myself that trying for inbox zero or pining over it is foolish, at least right now.)
Sidebar: why the idea of inbox zero is continually tempting
I do lose track of things every so often. I let things linger without replies, I forget things I was planning to do and find them again a month later, and so on. Also I delete a certain amount of things because keeping track of them (whether in my inbox or elsewhere) is just too much of a pain. And I've had my inbox grow out of control in the past (up to thousands of messages, where of course I'm not finding anything any more).
A neat, organized, empty inbox where this doesn't happen is an attractive vision, just like a neat organized and mostly or entirely clear desk is. It just doesn't seem like a realistic one.
2015-02-05
All of our important machines are pets and special snowflakes
One of the general devops mantras that I've seen go around is the pets versus cattle metaphor for servers (eg); pets are lovingly curated servers that you care about individually, while cattle are a mass herd where you don't really care about any single member. My perception is that a lot of current best practices are focused on dealing with cattle and converting pets into cattle. Unfortunately this leaves me feeling relatively detached from these practices because essentially all of our important machines are pets and are always going to stay that way.
This is not particularly because of how we manage them or even how
we think of them. Instead it is because in our environment, people
directly use specific individual machines on a continuous basis.
When you log into comps3 and run your big compute job on it, you
care very much if it suddenly shuts down on you. We can't get around
this by creating, say, a Hadoop cluster, because a large part of
our job is specifically providing general purpose computing to a
population of people who will use our machines in unpredictable
ways. We have no mandate to squeeze people down to using only
services that we can implement in some generic, distributed way
(and any attempt to move in that direction would see a violent
counter-reaction from people).
We do have a few services that could be generic, such as IMAP. However in practice our usage is sufficiently low that implementing these services as true cattle is vast overkill and would add significant overhead to how we operate.
(Someday this may be different. I can imagine a world where some container and hosting system have become the dominant way that software is packaged and consumed; in that world we'd have an IMAP server container that we'd drop into a generic physical server infrastructure and we could probably easily also have a load balancer or something that distributed sessions to multiple IMAP server containers. But we're not anywhere near that level today.)
Similarly, backend services such as our fileservers are in effect all pets. It matters very much whether or not fileserver <X> is up and running happily, because that fileserver is the only source of a certain amount of our files. I'm not convinced it's possible to work around this while providing POSIX compatible filesystems with acceptable performance, but if it is it's beyond our budget to build the amount of redundancy necessary to make things into true cattle where the failure of any single machine would be a 'no big deal' thing.
(This leads into larger thoughts but that's something for another entry.)
2015-02-04
How our console server setup works
I've mentioned before that we have a central console server machine where all of our serial consoles and other serial things all get centralized, automatically logged, and so on. While I don't think we're doing anything unusual in this area, I've realized that doing decent sized console servers is probably no longer common and so it might be interesting to describe how ours works.
The obvious way to do a (serial) console server is just to build a machine with a bunch of serial ports. This kind of works at small or moderate scale, but once you're talking about thirty or fifty or a hundred or more serial ports, things break down. There are two problems with this; first, you just can't fit that many serial ports into one piece of hardware for sane amounts of money, and second, you can't feasibly run serial lines to everything in remote locations (like master switches in building wiring closets and so on).
The thing that makes it possible to deal with all of this is serial
port to Ethernet concentrators; we use various models of Digi's
Etherlite series,
generally the rack-mountable 16 and 32 port versions. These have
some number of RJ-45 ports which we plug serial connections into
and an Ethernet port over which the system talks to their software
on our console server, where a Digi kernel module turns those
networked serial ports into /dev/... serial port entries that
look just like hardware serial ports.
On the console server we use conserver to manage the serial ports; it logs their traffic, handles actual interactive access to them, and so on. Conserver is probably not the only system for this (and may well not be the best); it's just what we use. It works and you can probably find it packaged for your Linux distribution of choice.
(As far as I know there's nothing that will directly talk the Digi Etherlite protocol so you can cut out the middleman of the fake kernel serial ports. I believe this is partly because the protocol is at least undocumented. It's possible that there are other serial port to Ethernet concentrators with documented protocols and thus direct support in projects like conserver.)
If we need serial ports in a remote location, for example to give access to a switch's console, we put an Etherlite in the location and connect it up. The serial connection to the Etherlite is subject to reasonable length limitations but obviously the network traffic is not. We run Etherlites and similar things over a physically separated and independent management network (described at the end of here).
Sidebar: How you connect serial ports to Etherlites
Etherlites don't have conventional serial ports; instead they use plain RJ-45 ports for higher density (which makes them look like Ethernet switches). Plain RJ-45 is also known as 'Ethernet connectors', so we wire things up using ordinary cat-5 Ethernet cables that plug into RJ45 to DB9 adaptors, which then plug into the servers.
(I don't think we have anything left with full sized serial ports; these day's it's DB9 or nothing. Fortunately servers are still coming with DB9 serial ports.)
I don't think you need to use full bore Ethernet cables for this. We just happen to already have everything we need to make cat-5 Ethernet cables already, so this way we keep everything standardized. To avoid confusion we use a special colour.
2015-02-03
Why we've wound up moving away from serial consoles on our machines
Back some time ago we really liked serial consoles here; we configured all of our machines with them, whether they were Linux or OpenBSD or whatever. But lately we've been moving distinctly away from serial consoles, to the point where none of our current generation machines are set up with them any more. We're doing this because in the end serial consoles got in the way of our troubleshooting during serious issues.
What we found is that often the times we deal with machines are when they're bad enough to lock up mysteriously or otherwise need hands-on attention (for example, to swap network wires to make a spare firewall the active one). When we're already physically interacting with a machine to try to figure out the problem, what we found we wanted to do was wheel over a cart, plug in a keyboard and a monitor, and have full interaction with everything from the BIOS on up. We didn't want to have to go back and forth from the physical machine to a desktop that was connected to the console server that the serial console emerged on.
What would be ideal would be a serial console that was a real mirror of everything on the physical console; both would get all kernel messages and all boot time messages from init et al, both could be used as the console in single-user mode, and you could log in on both after boot. But nothing gives us that and if we have to chose one thing to be the real console, the physical console wins. Remote administration is nice and periodically convenient but it's not as important as easy troubleshooting when things really go bad and we're in the machine room trying to deal with it.
(We have the Linux kernel console
configured to send messages to both the physical console and the
serial console on our Linux machines, so we can at least capture
kernel messages during a crash. Unfortunately I believe Linux is
the only Unix that can do that. And we're still running gettys
on the serial ports so we can log in over them if networking or
the ssh daemon has problems.)
PS: IPMIs with KVM over IP are great but they're not a complete replacement for serial consoles. They give us the remote access but not the logging of all console output so that we can look back later to find messages and so on.