How big our fileserver environment is (part 1)

October 13, 2010

I've said before that I call us a midsized environment (for a number of reasons, including that we're clearly not small and also nowhere near the size of large environments). Now I feel like sharing some actual numbers on how big and not-big we are. Today's entry is about the hardware of our fileserver environment; a future one will cover data size.

(This is necessarily a snapshot in time and is likely to change in the future.)

Our fileserver environment is made up from fileservers and backends. We currently have seven fileserver machines; four productions fileservers, one hot spare, and two test fileservers. One of the test fileservers is currently inactive because it hasn't been reinstalled after we used it to build a fast OS upgrade for the production machines.

(To the extent that we have roles for the test machines, one is intended to be an exact duplicate of the production machines so that we have a test environment for reproducing problems, and the other is for things like OS upgrades or patches.)

We currently have nine backends. Six are in production, one is a hot spare, and two are for testing; which one is the hot spare backend has changed over time as failures have caused us to bring the hot spare into production and turn the old production backend into the hot spare. Currently, all backends have a full set of disks; four backends (the first four we built) use 750 GB disks and the other five use 1 TB disks, including both test backends.

(We plan to raid the test backends for spare disks at some point when our spares pool is depleted but haven't needed to so far.)

As you could partly guess from the pattern of disk sizes, our initial production deployment was three fileservers and four backends; we expanded into test machines, hot spares, and then a fourth production fileserver and its two production backends from there.

Using so much hardware for hot spares and testing is a lot less extravagant than it seems. Because of university budget issues we pretty much bought all of this hardware in two large chunks, and once we had the hardware we felt that we might as well use it for something productive instead of having it sit in storage until we needed to expand the production environment. And we still have more hardware in storage, although we've more or less run out of unclaimed 1 TB drives.

(Some environments would be space, power, or cooling constrained; we haven't run into that yet.)

Sidebar: how much hardware we need for backend testing

We need two backends because that's how production machines are set up; all ZFS vdevs are made up from mirrored pairs, each side coming from a different backend. There's a fair amount of testing where we need to be able to duplicate this mirroring.

We need enough disks to fill a single backend, because we want to be able to do a full scale load test on a single backend; this verifies things like that the software and hardware can really handle driving all of the disks at once.

We don't strictly need enough disks to fill up both test backends, although it's convenient to have that many because then you don't have to worry about shuffling disks around based on what testing you want to do.


Comments on this page:

From 198.102.62.250 at 2010-10-14 13:30:01:

Chris, a couple of questions....

  • Why do you use 250GB LUN chunks instead of the full disk? For consistency or management ease?
  • Have you considered switching to COMSTAR for iSCSI target functionality or are you so pleased with IET's performance that there's no need to fix what ain't broke?

Ray Van Dolson

By cks at 2010-10-14 16:07:39:

We use fixed-size LUN chunks because it's the only way to manage a scalable long-term environment; it means we can always mirror any two LUNs together without worrying about how big they are. (There's more discussion of this in SANPartitionSizes.)

I have no interest in COMSTAR because it would require me to run (Open)Solaris on the iSCSI backends. As far as I can see this would be a massive loss on all sorts of dimensions (including practical hardware support).

Written on 13 October 2010.
« Why visited links being visible is important for blog usability
How big our fileserver environment is (part 2) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Oct 13 23:50:32 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.