How big our fileserver environment is (as of May 2020)
A decade ago I wrote some entries about how big our fileserver environment was back then (part 1 and part 2). We're now on our third generation fileservers and for various reasons it's a good time to revisit this and talk about how big our environment has become today.
Right now we have six active production fileservers, with a seventh waiting to go into production when we need the disk space. Each fileserver has sixteen 2 TB SSDs for user data, which are divided into four fixed size chunks and then used to form mirrored pairs for ZFS vdevs. Since we always need to be able to keep one disk's worth of chunks free to replace a dead disk, the maximum usable space on any given fileservers is 30 pairs of chunks. After converting from disk vendor decimal TB to powers of two TiB and ZFS overheads, each pair of chunks gives us about 449 GiB of usable space, which means that the total space that can be assigned and allocated on any given fileserver is a bit over 13 TiB. No fileserver currently has all of that space allocated, much less purchased by people. Fileservers range from a low of 1 remaining allocatable pair of chunks to a high of 7 such chunks (to be specific, right now it goes 1, 2, 5, 6, 6, 7 across the six active production fileservers, so we've used 153 pairs of chunks out of the total possible of 180).
We don't put all of this space into a single ZFS pool on each fileserver for all sorts of reasons, including that we sell space to people and groups, which makes it natural to split up space into different ZFS pools based on who bought it. Right now we have sold and allocated 58.8 TiB of space in ZFS pools (out of 67 TiB of allocated chunks, so this is somewhat less dense than I expected, but not terrible). In total we have 40 ZFS pools; the largest pool is 5.6 TiB of sold space and the smallest pool is 232 GB. Exactly half the pools (20 out of 40) have 1 TiB or more of allocated ZFS space.
Not all of this allocated ZFS space is actually used, thankfully
(just like every other filesystem, ZFS doesn't like you when you
keep your disks full). Currently people have actually used 38 TiB
of space across all of the filesystems in all of those pools. The
largest amount of space used in a single pool is 4 TiB, and the
largest amount of space used in a single filesystem is 873 GiB.
/var/mail is the third largest filesystem in used space,
at 600 GiB.
So in summary we've allocated about 85% of our disk chunks, sold about 87% of the space from the allocated chunks, and people are using about 64% of the space that they've purchased. The end to end number is that people are using about 48% of the space that we could theoretically allocate and sell. However, we have less room for growth on existing fileservers than you might think from this raw number, because the people who tend to buy lots of space are also the people who tend to use it. One consequence of this is that the fileserver with 1 free pair of chunks left is also one with two quite active large pools.
We have 305 meaningful ZFS filesystems across all of those ZFS pools (in addition to some 'container' filesystems that just exist for ZFS reasons and aren't NFS exported). The number of ZFS filesystems in each pool is somewhat correlated with the pool's size, but not entirely; multiple ZFS filesystems get created and used for all sorts of reasons. The most populated ZFS pool has 23 filesystems, while there are three pools with only one filesystem in them (those are a somewhat complicated story).
(We have more NFS mounts than ZFS filesystems for various reasons beyond the scope of this entry.)