Moving to smaller fileservers for us probably means no more iSCSI SAN

July 2, 2017

In our environment, one major thing that drives us towards relatively big fileservers is aggregating and lowering the overhead of servers. Regardless of how big or small it is, any server has a certain minimum overhead cost due to needing things like a case and power supply, a motherboard, and a CPU. The result of this per-server overhead is economies of scale; a single server with 16 disk bays almost certainly costs less than two servers with 8 disk bays each.

We have a long history of liking to split our fileservers from our disk storage. Our current fileservers and our past generation of fileservers have both used iSCSI to talk to backend disk enclosures, and the generation of fileservers before them used Fibre Channel to talk to FC hardware RAID boxes. Splitting up the storage from the fileservice this way requires buying extra machines, which costs more; what has made this affordable is aggregating a fairly decent amount of disks in each box, so we don't have to buy too many extra ones.

If we're going to have smaller fileservers, as I've come to strongly believe we want, we're going to need more of them. If we're going to keep a similar design to our current setup, we would need more iSCSI backends to go with them. All of this means more machines and more costs. In theory we could lower costs by continuing to use 16-disk backends and share them between (smaller) fileservers (so two new fileservers would share a pair of backends), but in practice this would make our multi-tenancy issues worse and we would likely resist the idea fairly strongly. And we'd still be buying more fileservers.

If we want to buy a similar number of machines in total for our next generation fileservers but shrink the number of disks and the amount of storage space supported by each fileserver, the obvious conclusion is that we must get rid of the iSCSI backends. Hosting disks on the fileservers themselves has some downsides (per my entry about our SAN tradeoffs), but at a stroke it cuts the number of machines per fileserver from three to one. We could double the number of fileservers and still come out ahead on raw machine count. In a 10G environment, it also eliminates the need for two expensive 10G switches for the iSCSI networks themselves (and we'd want to go to 10G for the next generation of fileservers).

If we want to reduce the size of our fileservers but keep an iSCSI environment, we're almost certainly going to be faced with unappetizing tradeoffs. Considering the cost of 10G switch ports as well as everything else, our most likely choice would be to stop using two backends per fileserver; instead each fileserver would talk to a single 16-disk iSCSI backend (still using mirrored pairs of disks). This would increase the overall number of servers, but not hugely (we would go from 9 servers total for our HD-based production fileservers to 12 servers; the three fileservers would become six, and then we'd need six backends to go with them).

(It turns out that I also wrote about this issue a couple of years ago. At the time we weren't as totally convinced that our current fileservers are too big as designed, although we were certainly thinking about it, and I was less pessimistic about the added costs for extra servers if we shrink how big each fileserver is and so need more of them. (Or maybe the extra costs just hadn't struck me yet.))

Written on 02 July 2017.
« Our current generation fileservers have turned out to be too big
Re-applying CPU thermal paste fixed my CPU throttling issues »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jul 2 01:22:44 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.