Wandering Thoughts archives

2012-12-31

How our fileserver infrastructure is a commodity setup (and inexpensive)

Our fileserver environment may sound luxurious and expensive (after all it involves Solaris, ZFS, and iSCSI, all things that often mean lots of money), but it isn't really. I've mentioned (in comments) a couple of times that it's essentially commodity hardware and that I don't think we could do it particularly less expensively without fundamentally changing what it is, but I've never really explained that in one place.

The fundamental architecture is a number of Solaris fileservers which get their actual disk space from a number of iSCSI backends over, well, iSCSI. Raw backend disk space is sliced up into standard-sized chunks, mirrored between two backends (on the fileservers), and aggregated together into various ZFS pools; filesystems in the pools are then NFS exported to our actual machines (all of which run Linux, currently Ubuntu 12.04 LTS).

All of the actual physical servers involved in this are basic 1U servers. They happen to be SunFire X2100s (backends) and X2200s (Solaris fileservers) from Sun, but surprisingly they weren't overpriced; up until Oracle bought them and ended the entire line, Sun actually had reasonably priced and very attractive 1U server hardware (well, at least at educational prices). The backends run RHEL 5 and the fileservers run Solaris, each of which is generally not cheap, but at the time that we set up our environment both were basically free; the university has a RHEL site license and had an inexpensive Solaris support agreement (Oracle since changed that).

(We added more memory to the iSCSI backends over their default configuration, but even at the time the memory was pretty cheap. The open source iSCSI target software we use on the backends is free.)

The iSCSI data disks are consumer 7200 RPM SATA disks (all Seagates as it happens, because that was what we liked at the time); 'enterprise' grade high speed SAS drives might have been nice but were well out of our price range. They're in relatively inexpensive (and not particularly impressively engineered) commodity external ESATA enclosures (with 12 disks each in 4U or so of rack space). The iSCSI backends are connected to the Solaris fileservers over two ordinary 1G Ethernet segments, each of which has its own switch but no other network infrastructure (well, besides cables). The fileservers talk NFS to our environment over standard 1G Ethernet.

One significant reason to call this a commodity storage setup is that once you accept the basic parameters of storage that's detached from the actual fileservers (for good reason) and mirrored disk space, I don't think hardware or software substitutions could save much money. The one obvious spot to do so is the backends, where you might be able to get a case and assemble a moderately custom box that held both the server board itself and the disks. We considered this option at the time but rejected it on the grounds that doing our own engineering was more risky for relatively modest amounts of savings.

(If we had wanted to put more than 12 or so data disks in a single backend it would have gotten more attractive, but we had various reasons for not liking this, including both the problem of putting too many eggs in one basket and what this would do to the costs of adding more storage later. Generally the bigger your unit of storage the more efficiency of scale you may get but also the more expensive it is to add more storage in the future.)

We initially attempted to build this environment using canned iSCSI server appliances from a storage vendor. This was unfortunately an abject failure that cost us a significant amount of time (although in the end, no money). I'm not sure that using an iSCSI appliance would have saved us money, although it might have saved us rack space (which is not an issue for us.)

Mirrored storage is the one serious luxury of this setup. I think it's been an important win, but it's undeniably increased costs; if we were using RAID 5 or RAID 6 we could offer significantly more storage with the same raw disk space (and thus cost). However this would involve a significantly different overall design. Off the top of my head I think we'd have to push the RAID stuff to the iSCSI backends instead of doing it on the fileservers and based on our experience to date (where we've had a few total backend failures due to, eg, a disk enclosure's power supply failing) the result would probably have been less reliable.

(All of our servers and disk enclosures have only a single power supply. Yes, we know the dangers. But that's what you get with inexpensive commodity hardware.)

Sidebar: iSCSI versus other network disk protocols

My short summary of this complex issue is that iSCSI works and all of the pieces necessary are free (in our environment); Solaris 10 comes with a functional iSCSI initiator and as mentioned the iSCSI target software that we run on Linux is open source (and well supported as far as my experiences go). In some environments iSCSI would probably increase your costs but in ours this is not the case, and while the protocol has that 'enterprisey' smell other people have already done all of the hard work to deal with it. And the performance is okay (and doesn't need jumbo frames on 1G Ethernet); a single fileserver can saturate both of its 1G connections to the backends under the right circumstances.

(The last time I looked I didn't feel enthused about ATA-over-Ethernet.)

OurCommodityFileservers written at 21:49:29; Add Comment

2012-12-21

Version control comes first

A commentator on my last entry wrote (in part):

I also find that sometimes [version control] can fall short in providing context to a change. Good living documentation (wiki, OneNote, etc.) makes up for the area that strict version control cannot; providing the reader with some understanding on context, and "why."

It is my position that most organization can benefit most from a good wiki (or something to that affect), then introduce opportunities for file based version control at a later date.

I very much disagree with this view for both general and pragmatic reasons.

Wikis create (theoretically) living documentation. Documentation is nice, but it is not crucial. (Ask lots of people.)

Version control creates much safer changes (you can see exactly what you changed and you can see how things used to be back when they worked (and you can go back to them)). It is an unusual environment that is not making changes frequently (sysadmins generally exist in large part to make changes) but changes are dangerous. Making changes safer is crucial and a major improvement in your environment.

That is the general reason. Now to the pragmatic one.

At this point in time, any place that's lacking both living documentation and version control is in a bad situation. Either they're so culturally backwards that they haven't been convinced of the virtues of either or they are so badly managed that their sysadmins can't create either. In a place that's backwards, dysfunctional, or both, simple version control is going to be much easier to introduce than living documentation, partly because it's much easier than writing documentation and partly because it can start paying off almost immediately.

Simple version control has a major additional benefit in an organization with problems: it fails much safer. If people stop using simple version control, nothing particularly bad happens (you revert to the status quo ante where you had no change tracking). If people stop updating your living documentation, what you have is misleading, out of date documentation; like incorrect comments in code, this is worse and more dangerous than having no documentation at all.

(This applies in the small as well as in the large, where you don't do a checkin or don't do a documentation update. It's also much easier to catch a missed checkin than a missed documentation update.)

VersionControlFirst written at 02:19:11; Add Comment

2012-12-20

Sysadmins should pretty much version control everything

Today's Sysadvent contained a casual, matter of fact bit:

This code [for deployment scripts] has probably never been threatened with version control.

I had a reaction to this.

At work, we are not quite hep to all the current DevOps coolness. We don't have metrics and dashboards, we don't have everything automated with things like Puppet, Chef, or Cfengine, and so on. But even in our low state of evolution, we do version control pretty much everything that moves. Our almost invariable rule is that anything that we change gets put in version control first. Scripts, configurations, etc etc.

Here is the thing. Doing this is trivial. You don't need to start up some big version control infrastructure or system to keep /etc in a big repository, you don't have to decide between git and Mercurial, and you barely need to do anything extra.

Just use RCS. RCS is a trivially easy single-file version control system. You don't have to set up anything big or make any particular changes in how you work. All you need to do (after a quick initial setup) is to run one more command after you edit something and are satisfied with your work (commit messages are optional but useful). And for doing that, you get all of the usual version control goodness; you can see what changed and when, and you can revert back to a past version (or pull bits of it out). Since RCS operates on single files, you can use it selectively, in mixed and tangled directories like /etc, and only when you need it.

By the way, don't worry that using RCS instead of something more sophisticated will prove a terrible decision in the future. You can migrate from RCS to other things when you reach that point.

(My writeup was for Mercurial, but you can use CVS-to-git stuff to move to git too.)

Sidebar: what it takes to use RCS the easy sysadmin way

apt-get install rcs
mkdir RCS
ci -l <file>
rcs -U <file>
[edit]
ci -u <file>
[edit]
ci -u <file>

There. You're done. Repeat the last two steps every time you edit the file (really, the last step, since you're already editing the file).

(Obviously some of the the initial steps can be skipped after the first time or the first time in any particular directory.)

VersionControlForEverything written at 03:14:15; Add Comment

2012-12-17

Should you alert on the glaringly obvious?

First off, I will say that part of this question is due to a peculiarity of the academic environment that I work in; we don't (at least officially) do anything outside of the working day. This creates a category of system problems that are glaringly obvious. If we're at our computers at all, we're going to notice when they happen.

(All of these are actionable alerts, things that we need to act on.)

Which brings me around to the question of whether our alerting system should generate alerts for these glaringly obvious problems. As I see it, there is one argument against generating the alerts and one and a half for.

The argument against generating the alerts is that they're both unnecessary and potentially distracting in the resulting crisis. By definition the glaringly obvious is something that you notice, and the last thing you need in the middle of a problem is to be hit by more noise in the form of your alert system telling you what you already know.

(This is especially dangerous if your alert system is going to be very noisy about a glaringly obvious problem. At that point it becomes quite easy to miss other messages or to overlook alerts about other things that are also going wrong.)

On the other hand, generated alerts create a marker (at least if done well). When you go back later for post-facto analysis the alerts can tell you when things started happening and when they stopped, which is information that you're probably not going to meticulously note down in the middle of a crisis. You can deal with the noise problem by keeping the alerts as quiet as possible (no email or paging, for example, just red markers on your dashboard).

Finally, the half point is the question of whether what you expect to be glaringly obvious actually will be. A total catastrophe probably will be, but smaller failures might be overlooked under at least some circumstances. Relatedly, having alerts for the glaringly obvious may speed up your troubleshooting because alerts effectively check a whole bunch of possibilities at once for you. Are DNS names suddenly not resolving a problem with your local DNS servers or a problem with your network link? Alerting may tell you immediately.

(The degenerate case of this is after-hours alerting, where you aren't in the office to notice the glaringly obvious.)

I don't have any handy answers to this question, it's just an issue that I want to note down and think about. I do think that the better your system deals with the alerting dependency problem the easier it is to alert on the glaringly obvious, because you get less noise from such massive failures.

AlertingOnTheObvious written at 01:00:36; Add Comment

2012-12-16

Alerts should be actionable (and the three sorts of 'alerts')

One of my pet peeves with alerting systems which I've touched on before is bad alerts, or more exactly a specific sort of bad alerts. It's my very strong opinion that all of your alerts should be actionable.

In fact, let's split alerts up into three categories:

  • alerts that your sysadmins can and should take immediate action on; these are actionable alerts. There is something to do right away in response to them.

  • alerts where the sysadmins need to think about and plan out what they'll do in response to the issue. These are developing situations that need considered responses, not red alerts that need to be dealt with immediately. Steadily shrinking disk space is one classical example.

  • alerts that the sysadmins can't do anything about either immediately or in the future.

(I'm using a broad view of 'alert' here. Alerts may send email or page your phone, but they may also turn an indicator red on your dashboard. Broadly, an alert is anything that is hopping up and down going 'pay attention to me!')

Partly because people seem to like alerting on everything at moves, a lot of alerting systems seem to start with most of their alerts being the third sort. This is bad for various reasons, including that it trains people to ignore alerts because there is too much noise.

My strong view is that you should never create an alert without asking yourself what people are going to do about the alert. If you can't answer the question or the answer is 'well, nothing', what you have is probably the third sort of alert and you should not generate it at all.

(Sometimes there are cases where you know there is a bad problem and somebody should do something but you don't know who and what. If you hit one of these while creating alerts, now is the time to figure out the answers. This may well require management decisions or approval.)

Okay, honesty compels me to add a fourth type of alerts: alerts that you can't do anything about but that you're forced to generate for political reasons, often so that when the alert triggers you can say with a straight face that you knew about the situation and were doing your best to deal with it (when the best is often 'we can't really do anything at all'). I suspect that in some organizations a lot of the alerts are like this.

ActionableAlerting written at 00:19:20; Add Comment

2012-12-13

A drawback of short servers

Since Oracle killed off the inexpensive SunFire line, our new generation of 1U servers is from Dell and we've started migrating to them. One of the two models is the Dell R210. It has two features of particular note. The first is that it doesn't have easily accessible drive cages; you have to disassemble things to swap (or add) drives, and then if you actually add a second drive you need to update the BIOS. The second is that it is only a half-length server. Half length servers are an interesting response to the ceaseless shrinking of component sizes; I actually like that you can get a perfectly good 1U server into half of the space that it needed not that long ago. But they turn out to have a little drawback.

We're not replacing our existing 1U SunFires in large blocks, the way some places might; instead we're pulling them out one by one as we 'upgrade' servers to Ubuntu 12.04 (by reinstalling them on new hardware). This leaves behind little 1U holes between still in production servers. If the Dells were full-depth, it would be easy to fill these holes with the new servers. But it turns out that you can't do this with half-length servers; 1U and probably even 2U is simply not enough vertical space to give you access to the back of a half-length server (not unless you have very small hands). No plugging in power, networking, serial cables for the console, or what have you. Instead these Dell half-length servers have to go in clear areas of rack space (which, fortunately, we have enough of) that have several U of space around them.

(It's not a problem if you have a bunch of these half-length servers all in a stack, because then you have enough free space for your hands.)

A big operation would probably do something like declare periodic downtimes to compact the racks (or with sufficiently long cables and enough daring, pull and reinsert servers while they were running). We're probably just going to let things be; we'll decommission more SunFires over time anyways, so sooner or later those 1U holes will widen into bigger ones.

(At the moment decommissioned SunFires are not being thrown out but are instead being made into a spares pool for our iSCSI backends. The need for such a spares pool is really the only reason for us to decommission most of these SunFires anyways; with a few exceptions they still run fine despite being relatively old hardware by most people's standards.)

PS: I like Dell's half-length solution a lot better than the other one I saw, which was to cram two half-width servers into a single full length 1U server chassis. That struck me as just plain awkward in various ways.

ShortServerDrawback written at 22:24:24; Add Comment

2012-12-10

The general lesson from the need for metrics

The lesson I learned about why metrics are important is an important lesson, but it's a specific lesson. It would be a shame to stop there, because there is a general lesson lurking in the underbrush behind it. That is:

Fallible humans are always going to overlook something.

This is the real lesson of fragile complexity, in all its various specific facets. Our systems are too complex for us to genuinely understand, and that complexity means we are always going to overlook something (and sooner or later that something will matter).

One of the things we need to do in system administration is to engineer large scale, high level approaches to our problems that can deal with this messy realization and that do not depend on post-facto specific fixes. It's always tempting to apply post-facto fixes, to say things like 'I'll make sure to check for performance problems after future changes to our fileserver infrastructure', but this is never going to be good enough. Even apart from the pragmatic issues pointed out by Perry Lorier in a comment, this is a fundamentally backwards looking solution; it deals with the problem we found this time around but it doesn't necessarily deal with a future problem.

This is the generalized reason for automated metrics collection and monitoring. If you gather metrics you're constructing a backstop for human fallibility. If and when something goes wrong because of something people overlooked, you have a chance to see it and catch it before things explode, a chance that you would not have if you relied purely on post-facto fixes.

A direct corollary of this is that it's important to gather all the metrics that you can, even for things that you don't think you have any use for. Gathering only metrics you have a use for now is a backwards looking solution; you're assuming that you know what you need. Fragile complexity says that you're wrong, you don't know yet what you're going to want to spot the next problem, a problem that you didn't even foresee being possible. So gather everything you can. That way you have a chance to beat the future.

MetricsGeneralLesson written at 23:07:40; Add Comment

2012-12-07

How I use virtualization (and what for)

The high level view is that I use virtualization for testing things on my workstation, which I think of as the typical sysadmin use of local virtualization. In terms of my taxonomy of virtualization usage, there are three different cases with a number of common elements. The common elements are that I only run VMs for a short time (I power them up, use them for a bit, and then shut them down again) and that I manage them by hand. Beyond that the cases split apart:

  • I have a small number of long-lived images that I use for desktop testing, primarily virtualized Windows desktops. These need good graphics support.

    (I wish you could legally and easily virtualize an OS X desktop, because it would make testing various things so much easier. Right now we have to resort to a small floating collection of old OS X machines, which in practice means that we don't routinely test our systems against OS X.)

  • my test server VM has disposable images and needs to be like real hardware (because it's usually the prototype for things that will wind up on real hardware). Today my actual usage has a highly variable setup and needs basic text-mode console access; however, in practice the basic setup of a new image is extremely constant (I could always start from one of six basic images) and it can be headless.

    (See the sidebar for a longer discussion.)

  • sometimes I wind up testing things other than yet another Ubuntu based server. These have disposable images, need to be like real hardware, have highly variable setup, and need at least basic console access.

    (In other words, today they're no different from my main test server VM but they could be if I handled my test server VM in a more efficient way.)

The result of all this is that my first priority is convenience. I don't care all that much about things like performance (provided that it's adequate, which it should be), scalability, or lots of management features, but because I interact with the virtualization system for much of the time that I'm using VMs at all, I care about how easy that is. A convenient, easy to use system avoids putting friction in the way of my testing and thus encourages me to do it more; an awkward, annoying one would tempt me to skimp on testing because really, do I need to wrestle with the VM system quite that much? Surely things are good enough as they are.

(Also, I don't want to discount the time saved and the lower aggravation from a system that's pleasant and smooth to use.)

(I didn't put this into my initial taxonomy, but image snapshots are relatively important to me. It's often more convenient to snapshot a server build partway through the customization process and then rollback to that point than to reinstall from scratch for yet another test cycle.)

PS: note that we don't currently have any production server virtualization. All of our virtualization right now is on people's desktops for testing.

Sidebar: Elaborating on my test server VM situation

The first step in bringing up any standard server here is to do a completely standardized Ubuntu basic install of 32-bit or 64-bit 8.04, 10.04, or 12.04 (almost always 12.04 right now, since it's the current LTS version). The result is specific to the server's IP address but is otherwise both fixed and independent of what the machine will be customized into, and once this basic install is done I do all of the remaining setup steps through ssh.

Currently I redo this basic install process from scratch almost every time around. But I don't actually need to do it this way. I could instead build six starter disk images (all with the same standard IP I use for the test server VM) and then just copy in the appropriate one every time I want to 'reinstall' my test server VM; this would cover almost everything that I do with it and save time and effort to boot.

(A small confession: this didn't occur to me until I was planning this entry and actively thinking about why I was going to say that my test VM had a highly variable setup.)

MyVirtualization written at 01:33:13; Add Comment

2012-12-05

In praise of KVM-over-IP systems

We're in the process of migrating to a new generation of server hardware and it is, for me, a kind of a sad moment. You see, while the new servers have generally better specifications, the old servers have one big thing that the new servers don't; a built in KVM-over-IP system as part of their remote management capabilities.

On the surface, perhaps KVM-over-IP ought not to be a big deal; all it really saves you is an occasional trip down to the machine room and we're not supposed to be that lazy. But this is the wrong way to look at it. What KVM-over-IP really means is that you can (re)install servers while doing other things.

When installing servers requires a trip to wherever the server is, it's an interruption and it means that you basically have to drop everything else you're doing to trek down to the machine room and babysit the server (don't forget the right install media, either). Interruptions are a pain, unproductive time is a pain, and sitting in a noisy, cold machine room is a pain too, so there's an incentive to avoid the whole thing as much as possible. All of this is friction.

KVM-over-IP systems remove this friction. You don't need to stop working on anything else, you don't need to leave your office, and you don't need all of the minutiae of installs (media, keyboard and display and mouse, a chair if you're going to be there long, a piece of paper with vital details about the machine, etc etc). If your installs take twenty minutes with fifteen minutes of non-interaction where you're just waiting for it to finish, no problems, this is just like any other fifteen minute process that sits in a window in the corner of your display until it's done. And if you need things during the install for some reason, you have the full resources of your usual working environment.

(A lot of this also applies to any other sort of testing or troubleshooting that needs console access to the machine. As long as you don't need to change hardware itself, you can do everything from your desk with your full working environment available to help; you don't have to stop everything else to relocate to the machine room so you can babysit the machine's console.)

Or in short, a KVM-over-IP system goes a long way towards making real servers just as convenient to deal with as virtualized ones on your desktop. And that's pretty convenient, when you get down to it.

PS: some people's answer to this is 'oh, I'll install the server in my office'. Given how noisy modern servers are, this is generally not going to be popular with people around you even if you can stand the noise yourself. It also doesn't help if the server is already set up down in the machine room and you're reinstalling it to, for example, repurpose it or upgrade its OS.

As a side note, where KVM-over-IP really shines is when you have several machines to (re)install at once, especially if you're trying to do it with as short a downtime as feasible. With KVM-over-IP, installing multiple machines just means a few more windows on your display.

(In addition to my experiences with our new servers, this entry was inspired by reading the start of this blog post (via). Yes, the wheels of blogging grind very slowly sometimes.)

Sidebar: remote power cycling and so on

I'm assuming that KVM-over-IP also includes 'remote media', where you can use ISO images on your desktop as (virtual) CD or DVD drives on the server. This seems to be a general feature these days.

I'm not including remote power cycling as a KVM-over-IP benefit because you can get that in a lot of ways. Particularly, most servers these days have a basic IPMI management interface that supports it.

KVM-over-IP is highly useful in certain circumstances, for example if a machine has problems, needs to have its console inspected, and everyone's at home or whatever. But I'm assuming that the answer to that one without a KVM-over-IP system is either a shrug or calling a taxi and anyways, that sort of thing is generally rare.

KVMOverIPImportance written at 02:42:40; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.