Where we have multi-tenancy in our fileserver environment
One of the things that people worry about and often consider a bad idea when designing systems is multi-tenancy, where one resource serves multiple people (or, well, upstream things). The problems inherent with multi-tenancy are well known to people who use public clouds, namely that other people's activity (often other people's invisible activity) can adversely impact you. We have a number of levels of multi-tenancy in our fileserver environment and today I feel like enumerating them.
(All of this is theoretically something that you can deduce from my writeups, but writing it down explicitly doesn't hurt.)
At the moment, we have multi-tenancy on three different levels; fileservers, iSCSI backends, and individual disks. On fileservers we have multiple pools, each of which generally serves a different set of users. Since each fileserver only uses two backends to support all of its pools, this implies that we have multi-tenancy on the iSCSI backends as well; each backend hosts disks for multiple pools and thus multiple user groups. We also have multi-tenancy on individual backend disks because we slice each physical disk up into fixed-size chunks to make them more manageable and then parcel out the chunks to different pools.
(When a fileserver doesn't have much of its disk space allocated, the disks themselves may only have one chunk used and so not be multi-tenanted yet. As more disk space gets allocated, we can and do run out of disks and start having to reuse them. On our current generation, the trigger point for disk multi-tenancy is more than about 5.7 TB of space getting allocated to pools.)
We don't share backends or backend disks between fileservers any more, so backends are not multi-tenanted to that degree. This at least makes it somewhat easier to work out what's causing there to be a choke point; you only have to look at one fileserver's activity instead of more than one.
Each of these multi-tenancy points creates obvious chokepoints. The most significant one is individual disks, since they have strict and often very low performance limits in the face of any significant volume of seeks or writes (including resilvers). I think that our current generation backends don't have an internal limit on aggregate disk bandwidth, but with only two currently 1G network interfaces they can easily hit total iSCSI bandwidth limits (200 Mbytes/sec is nothing these days if you've got a lot of disks going at once with sequential activity). The fileservers are limited on both NFS and iSCSI network bandwidth (1G and 2x1G respectively), but less obviously they're also limited on RAM for caching (which is effectively shared between all pools) and NFS and iSCSI processing in general (a single fileserver can only do so many NFS and iSCSI things at once).
(I'm ignoring the multi-tenancy created by having more than one person or project in a single ZFS pool as more or less out of scope. Most ZFS deployments will probably have some degree of multi-tenancy at that level for various reasons.)