Some additional information on ZFS performance as you approach quota limits
My first entry on this subject got some additional information from Allan Jude and others on Twitter, which I'm going to replicate here:
@alanjude: re: <my entry>] - Basically, when you are close to the quota limit, ZFS will rate-limit incoming writes as it has to be sure you won't go over your quota. You end up having to wait for the pending transactions to flush to find out how much room you have left
I was turned on to the issue by @garrett_wollman who uses quotas at a large institution similar to yours. I expect you won't see the worst of it until you are within 100s of MB of the quota. So it isn't being over 95% or something, so much as being 'a few transactions' from full
@garrett_wollman: Turning off compression when the dataset gets near-full clears the backlog (obviously at a cost), as does increasing the quota if you have the free space for it.
@thatcks: Oh interesting! We have compression off on most of our datasets; does that significantly reduce the issue (although presumably not completely eliminate it)?
(Sadly we have people who (sometimes) run pools and filesystems that close to their quota limits.)
@garrett_wollman: I don't know; all I can say is that turning compression off on a wedged NFS server clears the backlog so requests for other datasets are able to be serviced.
All of this makes a bunch of sense, given the complexity of enforcing filesystem size limits, and it especially makes sense that compression might cause issues here; any sort of compression creates a very uncertain difference between the nominal size and the actual on-disk size, and ZFS quotas are applied to the physical space used, not the logical space.
(I took a quick look in the ZFS on Linux source code but I couldn't spot anything that was obviously different when there was a lot of quota room left.)