Wandering Thoughts archives

2012-12-31

How our fileserver infrastructure is a commodity setup (and inexpensive)

Our fileserver environment may sound luxurious and expensive (after all it involves Solaris, ZFS, and iSCSI, all things that often mean lots of money), but it isn't really. I've mentioned (in comments) a couple of times that it's essentially commodity hardware and that I don't think we could do it particularly less expensively without fundamentally changing what it is, but I've never really explained that in one place.

The fundamental architecture is a number of Solaris fileservers which get their actual disk space from a number of iSCSI backends over, well, iSCSI. Raw backend disk space is sliced up into standard-sized chunks, mirrored between two backends (on the fileservers), and aggregated together into various ZFS pools; filesystems in the pools are then NFS exported to our actual machines (all of which run Linux, currently Ubuntu 12.04 LTS).

All of the actual physical servers involved in this are basic 1U servers. They happen to be SunFire X2100s (backends) and X2200s (Solaris fileservers) from Sun, but surprisingly they weren't overpriced; up until Oracle bought them and ended the entire line, Sun actually had reasonably priced and very attractive 1U server hardware (well, at least at educational prices). The backends run RHEL 5 and the fileservers run Solaris, each of which is generally not cheap, but at the time that we set up our environment both were basically free; the university has a RHEL site license and had an inexpensive Solaris support agreement (Oracle since changed that).

(We added more memory to the iSCSI backends over their default configuration, but even at the time the memory was pretty cheap. The open source iSCSI target software we use on the backends is free.)

The iSCSI data disks are consumer 7200 RPM SATA disks (all Seagates as it happens, because that was what we liked at the time); 'enterprise' grade high speed SAS drives might have been nice but were well out of our price range. They're in relatively inexpensive (and not particularly impressively engineered) commodity external ESATA enclosures (with 12 disks each in 4U or so of rack space). The iSCSI backends are connected to the Solaris fileservers over two ordinary 1G Ethernet segments, each of which has its own switch but no other network infrastructure (well, besides cables). The fileservers talk NFS to our environment over standard 1G Ethernet.

One significant reason to call this a commodity storage setup is that once you accept the basic parameters of storage that's detached from the actual fileservers (for good reason) and mirrored disk space, I don't think hardware or software substitutions could save much money. The one obvious spot to do so is the backends, where you might be able to get a case and assemble a moderately custom box that held both the server board itself and the disks. We considered this option at the time but rejected it on the grounds that doing our own engineering was more risky for relatively modest amounts of savings.

(If we had wanted to put more than 12 or so data disks in a single backend it would have gotten more attractive, but we had various reasons for not liking this, including both the problem of putting too many eggs in one basket and what this would do to the costs of adding more storage later. Generally the bigger your unit of storage the more efficiency of scale you may get but also the more expensive it is to add more storage in the future.)

We initially attempted to build this environment using canned iSCSI server appliances from a storage vendor. This was unfortunately an abject failure that cost us a significant amount of time (although in the end, no money). I'm not sure that using an iSCSI appliance would have saved us money, although it might have saved us rack space (which is not an issue for us.)

Mirrored storage is the one serious luxury of this setup. I think it's been an important win, but it's undeniably increased costs; if we were using RAID 5 or RAID 6 we could offer significantly more storage with the same raw disk space (and thus cost). However this would involve a significantly different overall design. Off the top of my head I think we'd have to push the RAID stuff to the iSCSI backends instead of doing it on the fileservers and based on our experience to date (where we've had a few total backend failures due to, eg, a disk enclosure's power supply failing) the result would probably have been less reliable.

(All of our servers and disk enclosures have only a single power supply. Yes, we know the dangers. But that's what you get with inexpensive commodity hardware.)

Sidebar: iSCSI versus other network disk protocols

My short summary of this complex issue is that iSCSI works and all of the pieces necessary are free (in our environment); Solaris 10 comes with a functional iSCSI initiator and as mentioned the iSCSI target software that we run on Linux is open source (and well supported as far as my experiences go). In some environments iSCSI would probably increase your costs but in ours this is not the case, and while the protocol has that 'enterprisey' smell other people have already done all of the hard work to deal with it. And the performance is okay (and doesn't need jumbo frames on 1G Ethernet); a single fileserver can saturate both of its 1G connections to the backends under the right circumstances.

(The last time I looked I didn't feel enthused about ATA-over-Ethernet.)

sysadmin/OurCommodityFileservers written at 21:49:29; Add Comment

GNU sort's -h option

I only recently became aware of GNU sort's -h option, which strikes me as a beautiful encapsulation of everything (both good and bad) that people attribute to GNU programs and their profusion of options.

GNU sort's -h is like -n (sort numerically) except that it sorts numerically for GNU's 'humane' numbers, as produced by (for example) GNU du's -h option. This leads naturally to a variant of a little script that I've already talked about:

du -h | sort -hr | less

On the one hand, -h is clearly useful in both commands. Humane numbers are a lot easier to read and grasp than plain numbers, and now GNU sort will order them correctly for you. On the other hand you can see the need for a -h argument to sort as evidence of an intrinsic problem with du -h; in this view, GNU is piling hack on top of hack. The arguable Unix way might be a general hum command that humanized all numbers (or specific columns of numbers if you wanted); that would make the example into 'du | sort -nr | hum | less', which creates a general tool at the price of making people add an extra command to their pipelines.

I don't have any particular view on whether GNU sort's -h option is Unixly wrong or not. I do think that it's (seductively) convenient, and now that I've become aware of it it's probably going to work its way into various things I do.

(This could spark a great debate on what the true Unix way is, but I'm not going to touch that one right now.)

unix/GNUSortHOption written at 03:12:33; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.