== Your Illumos-based NFS fileserver may be 'leaking' deleted files By now you may have guessed the punchline of my sudden interest in [[ZFS delete queues ZFSDeleteQueue]]: we had [[a problem with ZFS leaking space for deleted files https://twitter.com/thatcks/status/598527622205853696]] that was ultimately traced down to an issue with [[pending deletes ../unix/UnixPendingDeleteProblem]] that our fileserver wasn't cleaning up when it should have been. As a well-debugged filesystem, ZFS should not outright leak pending deletions, where there are no remaining references anywhere yet the files haven't been cleaned up (well, more or less; snapshots come into the picture, [[as mentioned ZFSDeleteQueue]]). However it's possible for both user-level and kernel-level things to hold references to now-deleted files in the traditional way and thus keep them from being actually removed. User-level things holding open files should be visible in, eg, _fuser_, and anyways this is a well-known issue that savvy people will immediately ask you about. Kernel level things may be less visible, and there is at least one in mainline Illumos and thus OmniOS r151014 (the current release as I write this entry). Per George Wilson on the illumos-zfs mailing list [[here http://permalink.gmane.org/gmane.os.illumos.zfs/4836]], Delphix found that the network lock manager (the _nlockmgr_ SMF service) could hold references to (deleted) files under some circumstances (see the comment in [[their fix https://github.com/delphix/delphix-os/commit/276ccda5fa7fba148b00480f3cf2081d36fa6aff#diff-221559eaeba658f454998a0ce9f1c5d9R377]]). Under the right circumstances this can cause significant space lossage over time; we saw loss rates of 5 GB a week. This is worked around by restarting _nlockmgr_; this restart drops the old references and thus allows ZFS to actually remove the files and free up [[potentially significant amounts of your disk space http://permalink.gmane.org/gmane.os.illumos.zfs/4831]]. Rebooting the whole server will do it too, for obvious reasons, but is somewhat less graceful. (Restarting _nlockmgr_ is said to be fully transparent to clients, but we have not attempted to test that. When we did our _nlockmgr_ restart we did as much as possible to make any locking failures a non-issue.) As far as I know there is no kernel-level equivalent of _fuser_, so that you could list eg even all currently active kernel level references to files in a particular filesystem (never mind what kernel subsystem is holding such references). I'd love to be wrong here; [[it's an annoying gap in Illumos's observability https://twitter.com/thatcks/status/598970487394414592]].