Wandering Thoughts archives

2013-08-31

Simple availability doesn't capture timing and the amount of warning

Here is a mistake that I have actually kind of made: a simple availability or 'amount of downtime' number does not fully capture your availability situation. In real life it matters a lot both when you go down and whether or not you have advance warning. To put it simply, an hour of planned downtime at 6pm is qualitatively different from an hour of unplanned downtime at 6pm (or at 11am on your busiest morning) even if they have exactly the same effect on your overall availability numbers.

(I've sometimes seen availability numbers cited as excluding planned downtimes. That strikes me as disingenuous unless it comes with very careful disclaimers and a bunch of additional information.)

Of course it's better to not have the downtime at all, but if you're going to have it it's generally quite worthwhile to transform an unplanned downtime into a planned one (often even if the planned downtime is longer). There is a surprising amount of technology that effectively exists to do this conversion; for example, any non-hotswappable form of redundancy.

(If you have some form of redundancy that you can't hotswap and one half of it breaks (so now you have no redundancy), you're going to have to eventually take things down to restore the redundancy. This shifts the unplanned downtime of losing your only whatever-it-is to the planned downtime of replacing one.)

Sidebar: UPSes in this view

If you have a perfect UPS and no source of alternate or additional power (a redundant power supply, a transfer switch, etc), you're likely converting unplanned power failures into planned UPS battery replacements. In real life UPSes have been known to cause problems and it's usually not that difficult to have power redundancy. Overall a good setup probably simply decreases the chances of unplanned downtimes.

(Our UPSes exist not to prevent unplanned downtimes from power loss but to hopefully prevent unplanned downtimes from ZFS pool corruption due to power loss. This gives me an odd perspective on UPS issues.)

AvailabilityTiming written at 23:02:42; Add Comment

2013-08-22

I've changed my thinking about redundant power supplies

Back almost at the start of Wandering Thoughts, I wrote an entry in which I was pretty negative on redundant power supplies. Since I'm busy specifying redundant power supplies for our new generation of fileserver hardware, I think it's about time I admitted something: now that I'm older and somewhat wiser, I'm changing my mind. Redundant power supplies can be quite worth it. In fact I was at least partially wrong back then.

(In my defense, at the time I had very little experience with decent server hardware for reasons that do not fit in the margins of this entry but boil down to 'hardware budget? what's that?'. In retrospect this shows quite vividly in parts of that old entry.)

It's still true that in theory there are plenty of bits of hardware that can break in your server (and the power supplies in our servers have been very reliable). But in practice we've suffered several power supply failures (especially in our backend disk enclosures) and they are probably either the first or second most common cause of hardware failures around here. Apart from the spinning rust of system drives, those other bits of fragile hardware almost never have failed for us.

(Also, an increasing amount of server hardware effectively has some amount of redundancy for the other breakage-prone parts. For example, the whole system (CPUs included) may be passively cooled through multi-fan airflow; if one fan fails, alarms go off but there's enough remaining airflow and cooling that the system doesn't die.)

There's also an important second thing that redundant power supplies enable for crucial servers: they let you deal easily with various sorts of UPS issues (as I noted in that entry). As we both want UPSes and have had UPS problems in the past, this is an important issue for us. We have a solution now but it adds an extra point of failure; redundant power supplies would let us get rid of it.

There is also a pragmatic side of this. In practice hardware with redundant hot swappable power supplies is almost always simply better built in general (power supplies included). Part of our disk enclosure power supply problems likely come from the fact that the power supplies are generic PC power supplies that have had to power 12 disks on a continuous basis for years. Given our much better experience with server power supplies it seems likely that a better grade of power supply would improve things in general.

(Part of this is probably just that hot-swap server power supplies are less generic and thus more engineered than baseline PC power supplies.)

I'm now all for redundant power supplies in sufficiently important servers. However I'm still not sure that I'd put redundant power supplies into most of our servers unless I got them essentially for free; many of our server are not quite that important and for some we already have server-level redundancy.

RedundantPowerSuppliesII written at 00:22:01; Add Comment

2013-08-21

Disk enclosures versus 'all in one case' designs

There are two basic choices if you want a decent number of disks attached to a server (such as for iSCSI backends); you can use a basic generic server with an external disk enclosure or you can use a fancy server that has all of those disk bays integrated. When we set up our iSCSI backends we used the first option, partly for historical reasons. I've recently been thinking about this issue again for our next generation of hardware so I want to write something down on the balance back and forth.

The advantage of the external disk enclosure approach in general is that you get independence and isolation. Your choice of server is not tied to what you can get with enough drive bays and similarly your choice of drive bay enclosures is not constrained by what options the server vendors choose to make available. Especially for servers, if a server goes out of production you don't really care; get another generic server and you're likely good to go. Disk enclosures may be a bit more of a problem but even they are getting pretty generic. Separate enclosures can also simplify the spares situation a lot, especially if you buy servers in bulk. They also simplify swapping dead hardware around.

(This is especially so if the disk enclosures are the parts most likely to die. A modern server has annoying dependencies on its physical hardware but the disk enclosure is usually generic. Pull the disks out of one, stick them all in another, and probably nothing will notice. We have magically recovered backends this way.)

The advantages of an all in one case are less obvious but essentially they are that you have one case instead of two. This means fewer points of failure (for example you have only one power supply that has to keep working instead of two), fewer external cables to cause heartburn, and less physical space and power connectors required (it may also mean less power needed). It can also mean that you pay less. Potential savings are especially visible if you are basically assembling the same parts but deciding whether to put them in one case or two.

(In theory you should always pay less because you're buying one less case. In practice there are a lot of effects, including that vendors mark up what they feel are bigger servers and often have relatively cheap basic 1U servers. You can try to build your own custom bigger server using something like the guts of the 1U server, but you probably can't get the components of such a 1U server anywhere near as cheaply as a big vendor can. I wouldn't be surprised if the economics work out such that you're sometimes getting the case and power supply for almost free.)

I don't think one option is clearly superior to the other until you start to get into extreme situations. At very few disks or very many I think that all in one case designs start winning big. At the low end buying a second chassis for a few disk slots is absurd and expensive. At the high end you have the problem of connecting to all of those disks externally with decent bandwidth (and you start hating the two-case price penalties if you're trying to be very cheap).

DiskShelfVsOneCase written at 00:23:54; Add Comment

2013-08-20

The challenge for ARM servers, at least here

Every so often I think about the theoretical future coming of ARM-based servers (which various people have prophesied for years). Since hardware is on my mind lately anyways, one of the things I've been mulling over is what it would take to get us interested in such a server.

On the one hand, in theory a Linux-based ARM server ought to be easy to integrate into our environment. Assuming that it was supported by, say, Ubuntu, Linux is Linux and we almost entirely use Ubuntu packages instead of compiling our own programs. Adding a different architecture always causes a certain amount of heartburn and annoyance but it wouldn't be a particularly big problem overall.

On the other hand, well, why would we go to ARM over x86? We're not particularly limited by space, power, or cooling in our current machine room so simply being smaller and cooler is nowhere near enough. I think that ARM would have to offer us would be cost-competitive performance in some way in a form factor that was easy to fit into our current rack infrastructure. Actually being merely cost competitive is not enough; since running another architecture does add complexity, ARM servers would have to be better somehow (probably by being cheaper for the same performance).

One way to do this would be to sell us lower-performing servers for cheaper than anyone is currently doing with x86 1U servers. There are a lot of jobs where we are currently using five or ten year old machines. We would be quite interested in reliable $500 servers that could take over those jobs, regardless of what architecture they used. On the other hand I'm not sure that this is really possible due to fixed costs and general overheads in the server market (ARM CPU or otherwise, you still need the sheet metal and so on).

Similarly we're not interested in high-density blade designs. The problem with blade designs is more or less the big hardware problem: unless the chassis is somehow completely dumb you have an expensive single point of failure. Blade designs make sense if you're space constrained, but we're not.

(One of the things that this makes me believe is that the current rack sizes are now too big for small servers. Between 2.5" disks and ARM CPUs and so on you can probably get a decent basic server into a half-width, half depth 1U rack slot, especially if you had some sort of standard rack DC PDU. Expansion slots probably get problematic at that size but most of our 1U servers don't use any. Just make sure that you have a bunch of 1GB or 10GB-T Ethernet ports and you're pretty much done.)

ArmServerChallenge written at 00:23:19; Add Comment

2013-08-11

Multi-mount protection and SAN failover

Suppose that you have some machines, some shared disks, and a filesystem that has what gets called multi-mount protection, where it tries to prevent being mounted on two different machines at the same time. Do you have enough to do reliable SAN failover in the face of a crashed machine? Unfortunately the answer is no.

The first problem is that common implementations of multi-mount protection are not necessarily fully reliable. While it is possible to do reliable locking with only read and write operations to a shared disk (see eg Dekker's or Peterson's algorithm), many MMP implementations do not go to this extent; instead they rely on statistical properties, such as that a check block will get written every so often by the server that owns the filesystem. This usually works but cannot be absolutely guaranteed to do so in the face of a machine that is in an unknown but broken state.

(At the simplest level the check block might be written by a separate 'check block writer' process that has gotten stuck somehow.)

The larger problem is that multimount protection is essentially solving the wrong problem for failover. For forced failover, system A legitimately owned the filesystem when it was healthy and active but is now in some unknown but unhealthy state. Since it is in an unknown state, it has not properly released its ownership and you cannot count on it to be inactive. You want to forcefully take ownership of the filesystem away from system A and make it so that system A will not write anything more to the filesystem, and you must do this without system A's cooperation (because it may not cooperate, since it is in a bad state).

At best multi-mount protection will tell you that system A does not seem to have recently written anything that MMP checks. It cannot assure you that system A will not do any writes in the future. To do this you must somehow forcefully fence system A away from filesystem writes, either with storage-level features or simply by a remote power off of system A.

You can get away with active, check-based MMP for SAN failover only if you trust things to not go too badly wrong, so that the check saying that system A is inactive is sufficient to guarantee that it actually is and that it will stay that way.

Sidebar: two levels of multi-mount protection

The basic level of multi-mount protection is simply an 'is active' marker of some sort in the filesystem; a system sets the marker when it mounts the filesystem and unsets it when the filesystem is unmounted. This sort of MMP doesn't help you at all in failover because system A is unlikely to have actually released the filesystem before it stopped working right.

The advanced level of MMP is something in the filesystem that is actively updated on a frequent basis. If you passively watch the filesystem and there are no updates to the marker over a sufficiently long time, you can conclude that the theoretical owner either doesn't exist any more or at least is not working right.

MultimountAndSANFailover written at 00:01:52; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.