Wandering Thoughts archives

2013-10-31

Our likely future backend and fileserver hardware

At this point we've finalized almost all of the hardware for renewing the hardware for our fileserver infrastructure, unless something terribly bad turns up (which is always possible). So today I feel like talking about what hardware we're choosing and the size and scope of our project, partly because it seems uncommon to do this sort of thing.

The base backend hardware is a SuperMicro X9SRH-7TF motherboard with 8G of (ECC) RAM, an Intel E5-2603 CPU, an extra LSI 9207-8i SAS controller card, and some additional networking. This gives us a single-socket motherboard with dual Intel 10G-T ports, which is the important thing for us because it makes 10G-T cheap enough that we can afford it. We need the LSI card because we're talking to SATA disks so we want to avoid SAS expanders and the motherboard only has 8 SAS ports onboard. All of this is in a SuperMicro SC 836BA-R920 case, which gives us 16 3.5" front panel drive bays for iSCSI data disks and two rear 2.5" drive bays for mirrored SSDs as the system disks.

(For backends the additional networking is likely to be a cheap Realtek 1G card. For fileservers it'll be a dual Intel 1G or 10G-T, depending on what we can afford. Fileservers will also have other hardware variance, such as a lot more memory and probably no LSI card.)

Our current plans for disks for backends is twelve 2TB WD Se drives (7200 RPM SATA drives with a five year warranty) plus four SSDs for ZFS ZILs; we haven't selected the SSDs yet. It's possible that we'll shift to one or two more HDs and less ZIL SSDs. The system SSDs will be a pair of semi-random 60 GB SSDs, since you don't need more than that for your system disks (well, you hardly need even that).

At the moment we have three primary HD-based fileservers with two backends each, one SSD-based fileserver with three backends, one further fileserver which now doesn't need to be on separate hardware, a hot spare backend (with disks) and fileserver, and some test hardware that I'm going to ignore. The most urgent things to replace are the HD based fileservers because our current disks are starting to die at an accelerating rate and you can't really get SATA drives with 512b sectors any more.

Thus a full scale replacement of the HD side requires eleven units (assuming we use the same case for fileservers and backends) and at least 84 WD Se drives. Replacing the SSD-based fileserver requires three units but no new data drives; our current SSDs are new enough to last us for a while. Due to the ZFS 4K sector mess we have to replace hardware in units of 'one fileserver and its backends', ie three units and 24 HDs. I'd like three units of test hardware (a fileserver and two backends), but I suspect we can't afford that.

(The current SSD-based fileserver has three backends for reasons that boil down to hardware issues with our current SSD enclosures. We wouldn't need to replicate this with new hardware.)

I'm going to skip doing a tentative costing out of all of this for fuzzy reasons. Interested parties can use the item and quantity counts here to do it for themselves.

Disclaimer: We haven't gone through any sort of competitive evaluation process to select this particular set of hardware out of the vast universe of possible hardware that meets our general specifications. We've just found hardware that meets our needs, has prices that seem sane, and that works in our testing (so far). As such I can't say anything about whether or not this would be your best and/or cheapest option in this area. We've also deliberately chosen not to put too many disks in one single physical unit (or to use disks that are too large, partly because of a desire to keep up our IOPs).

Sidebar: software and other details

We'll use some Linux with our usual iSCSI target software on the backends. The frontends will run OmniOS (and use ZFS). Using a single CPU core on the fileservers may strike some people as eye-raising, but we aren't going to be touching ZFS dedup at all and after thinking about some of the issues involved I don't think we want compression either. This makes me feel that dual-core would be overkill.

(I've tested both Linux and OmniOS on this hardware and they work, although tuning 10G performance is clearly going to be interesting.)

sysadmin/FutureFileserverHardware written at 23:18:06; Add Comment

Naming disk devices: drive IDs versus drive locations

From my perspective there are two defensible ways of naming disk drives at the operating system level. You can do it by a stable identifier tied to the physical drive somehow, such as a drive serial number or WWN, or by a stable identifier based on its connection topology and thus ultimately the drive's physical location (such as the 'port X on card Y' style of name). I don't want to get into an argument about which one is 'better' because I don't think that argument is meaningful; the real question to ask is which form of naming is more useful under what circumstances.

(Since the boundaries between the two sorts of names may be fuzzy, my rule of thumb is that it is clearly a drive identifier if you have to ask the drive for it. Well, provided that you are actually speaking to the drive instead of a layer in between. The ultimate drive identifiers are metadata that you've written to the drive.)

Before I get started, though, let me put one inconvenient fact front and center: in almost all environments today, you're ultimately going to be dealing with drives in terms of their physical location. For all the popularity of drive identifiers as a source of disk names (among OS developers and storage technologies), there are very few environments right now where you can tell your storage system 'pull the drive with WWN <X> and drop it into my hands' and have that happen. As I tweeted I really do need to know where a particular disk actually is.

This leads to my bias, which is that using drive identifiers makes the most sense when the connection topology either changes frequently or is completely opaque, or both. If your connection topology rearranges itself on a regular basis then it can't be a source of stable identifiers because it itself isn't stable. However you can sometimes get around this by finding a stable point in the topology; for example, iSCSI target names (and LUNs) are a stable point whereas the IP addresses or network interfaces involved may not be.

(Topology rearrangement can be physical rearrangement, ranging from changing cabling all the way up to physically transferring disks between enclosures for whatever reason.)

Conversely, physical location makes the most sense when topology is fixed (and drives aren't physically moved around). With stable locations and stable topology to map to locations, all of the important aspects of a drive's physical location can be exposed to you so you can see where it is, what the critical points are for connecting to it, what other drives will be affected if some of those points fail or become heavily loaded, and so on. Theoretically you don't have to put this in the device name if it's visible in some other way, but in practice visible names matter.

My feeling is that stable topology is much more common than variable topology, at least once you identify the useful fixed points in connection topology. Possibly this is an artifact of the environment I work in; on the other hand, I think that relatively small and simple environments like mine are much more common than large and complex ones.

Sidebar: the cynic's view of OS device naming

It's much easier to give disks an identifier based device name than it is to figure out how to decode a particular topology and then represent the important bits of it in a device name, especially if you're working in a limited device naming scheme (such as 'cXtYdZ', for example). And you can almost always find excuses for why the topology might be unstable in theory (eg 'the sysadmin might move PCI cards between slots and oh no').

tech/DiskNamingIDVsLocation written at 01:14:16; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.