== Optimizing finding unowned files on our ZFS fileservers One of the things we do every weekend is look for files on [[our fileservers ZFSFileserverSetupII]] that have wound up being owned by people who don't exist (or, more commonly, who no longer exist). For a long time this was done with the obvious approach using _find_, which was basically this: .pn prewrap on > SFS=$(... generate FS list ...) > gfind -H $SFS -mount '(' -nogroup -o -nouser ')' -printf ... The problem with this is that we have enough data in enough filesystems that running a _find_ over the entire thing can take a significant amount of time. On our biggest fileserver, we've seen this take on the order of ten hours, which either delays the start of [[our weekly pool scrubs ZFSPeriodicScrubbing]] or collides with them, slowing them down ([[and they can already be slow enough ZFSFasterScrubsDesire]]). Recently I realized that we can do much better than this by not checking most of our filesystems. The trick is to use ZFS's existing infrastructure for quotas. As part of this ZFS maintains information on the amount of space used by every user and every group on each filesystems, which the '_zfs userspace_' and '_zfs groupspace_' commands will print out. As a side effect this gives you a complete list of every UID and GID that uses space in the filesystem, so all we have to do is scan the lists to see if there are any unknown ones in it. If all UIDs and GIDs using space on the filesystem exist, we can completely skip running _find_ on it; we know our _find_ won't find anything. Since our filesystems don't normally have any unowned files on them, this turns into a massive win. In the usual case we won't scan any filesystems on a fileserver, and even if we do scan some we'll generally only scan a handful. It may even make this particular process fast enough so that we can just run it after deleting accounts, instead of waiting for the weekend. By the way, the presence of unknown UIDs or GIDs in the output of '_zfs *space_' doesn't mean that there definitely are files that a _find_ will pick up. The unowned files could be only in a snapshot, or they could be deleted files that are being held open by various things, including [[the NFS lock manager ZFSDeleteQueueNLMLeak]].