How big our fileserver environment is (part 1)
I've said before that I call us a midsized environment (for a number of reasons, including that we're clearly not small and also nowhere near the size of large environments). Now I feel like sharing some actual numbers on how big and not-big we are. Today's entry is about the hardware of our fileserver environment; a future one will cover data size.
(This is necessarily a snapshot in time and is likely to change in the future.)
Our fileserver environment is made up from fileservers and backends. We currently have seven fileserver machines; four productions fileservers, one hot spare, and two test fileservers. One of the test fileservers is currently inactive because it hasn't been reinstalled after we used it to build a fast OS upgrade for the production machines.
(To the extent that we have roles for the test machines, one is intended to be an exact duplicate of the production machines so that we have a test environment for reproducing problems, and the other is for things like OS upgrades or patches.)
We currently have nine backends. Six are in production, one is a hot spare, and two are for testing; which one is the hot spare backend has changed over time as failures have caused us to bring the hot spare into production and turn the old production backend into the hot spare. Currently, all backends have a full set of disks; four backends (the first four we built) use 750 GB disks and the other five use 1 TB disks, including both test backends.
(We plan to raid the test backends for spare disks at some point when our spares pool is depleted but haven't needed to so far.)
As you could partly guess from the pattern of disk sizes, our initial production deployment was three fileservers and four backends; we expanded into test machines, hot spares, and then a fourth production fileserver and its two production backends from there.
Using so much hardware for hot spares and testing is a lot less extravagant than it seems. Because of university budget issues we pretty much bought all of this hardware in two large chunks, and once we had the hardware we felt that we might as well use it for something productive instead of having it sit in storage until we needed to expand the production environment. And we still have more hardware in storage, although we've more or less run out of unclaimed 1 TB drives.
(Some environments would be space, power, or cooling constrained; we haven't run into that yet.)
Sidebar: how much hardware we need for backend testing
We need two backends because that's how production machines are set up; all ZFS vdevs are made up from mirrored pairs, each side coming from a different backend. There's a fair amount of testing where we need to be able to duplicate this mirroring.
We need enough disks to fill a single backend, because we want to be able to do a full scale load test on a single backend; this verifies things like that the software and hardware can really handle driving all of the disks at once.
We don't strictly need enough disks to fill up both test backends, although it's convenient to have that many because then you don't have to worry about shuffling disks around based on what testing you want to do.