== The ZFS delete queue: ZFS's solution to the pending delete problem Like every other always-consistent filesystem, ZFS needs a solution to [[the Unix pending delete problem ../unix/UnixPendingDeleteProblem]] (files that have been deleted on the filesystem but that are still in use). ZFS's solution is implemented with a type of internal ZFS object called the 'ZFS delete queue', which holds a reference to any and all ZFS objects that are pending deletion. You can think of it as a kind of directory (and technically it's implemented with the same underlying storage as directories are, namely a {{AB:ZAP:ZFS Attribute Processor}} store). Each filesystem in a ZFS pool has its own ZFS delete queue object, holding pending deletes for objects that are in (or were originally in) that filesystem. Also, each snapshot has a ZFS delete queue as well, because the current state of a filesytem's ZFS delete queue is captured as part of making a snapshot. This capture of delete queues in snapshots has some interesting consequences; the short version is that once a delete queue with entries is captured in a snapshot, the space used by those pending deleted objects cannot be released until the snapshot itself is deleted. (I'm not sure that this space usage is properly accounted for in the '_usedby*_' space usage properties, but I haven't tested this specifically.) There is no simple way to find out how big the ZFS delete queue is for a given filesystem. Instead you have to use the magic _zdb_ command to read it out, using '_zdb -dddd DATASET OBJNUM_' to dump details of individual ZFS objects so that you can find out how many ZAP entries a filesystem's 'ZFS delete queue' object has; the number of current ZAP entries is the number of pending deletions. See the sidebar for full details, because it gets long and tedious. (In some cases it will be blatantly obvious that you have some sort of problem because _df_ and '_zfs list_' and so on report very different space numbers than eg _du_ does, and you don't have any of the usual suspects like snapshots.) Things in the ZFS delete queue still count in and against per-user and per-group space usage and quotas, which makes sense because they're still not quite deleted. If you use '_zfs userspace_' or '_zfs groupspace_' for space tracking and reporting purposes this can result in potentially misleading numbers, especially if pending deletions are 'leaking' ([[which can happen ZFSDeleteQueueNLMLeak]]). If you actually have and enforce per-user or per-group quotas, well, you can wind up with users or groups that are hitting quota limits for no readily apparent reason. (Needing to add things to the ZFS delete queue has apparently caused problems on full filesystems at least in the past, per [[this interesting opensolaris discussion from 2006 http://zfs-discuss.opensolaris.narkive.com/Ao4kR0XZ/can-t-remove-directory-when-over-quota]].) === Sidebar: A full example of finding how large a ZFS delete queue is To dump the ZFS delete queue for a filesystem, first you need to know what its object number is; this is usually either 2 (for sufficiently old filesystems) or 3 (for newer ones), but the sure way to find out is to look at the ZFS master node for the filesystem (which is always object 1). So to start with, we'll dump the ZFS master node to find out the object number of the delete queue. .pn prewrap on # zdb -dddd fs3-corestaff-01/h/281 1 Dataset [....] Object lvl iblk dblk dsize lsize %full type 1 1 16K 1K 8K 1K 100.00 ZFS master node [...] microzap: 512 bytes, 3 entries DELETE_QUEUE = 2 [...] The object number of this filesystem's delete queue is 2 (it's an old filesystem, having been originally created on Solaris 10). So we can dump the ZFS delete queue: # zdb -dddd fs3-corestaff-01/h/281 2 Dataset [...] Object lvl iblk dblk dsize lsize %full type 2 2 16K 16K 144K 272K 100.00 ZFS delete queue dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 16 Fat ZAP stats: [...] ZAP entries: 5 [...] 3977ca = 3766218 3977da = 3766234 397a8b = 3766923 397a87 = 3766919 397840 = 3766336 (The final list here is the ZAP entries themselves, going from some magic key (on the left) to the ZFS object numbers on the right. If we wanted to, we could use these object numbers to inspect (or even read out) the actual things that are pending deletion. This is probably most useful to find out how large they are and thus how much space they should be consuming.) There are two different forms of ZAPs and _zdb_ reports how many entries they have somewhat differently. In the master node we saw a 'microzap', used when the ZAP is and always has been small. Here we see a 'Fat ZAP', which is what a small ZAP turns into if at some point it grows big enough. Once the ZFS delete queue becomes a fat ZAP it stays that way even if it later only has a few entries, as we see here. In this case the ZFS delete queue for this filesystem holds only five entries, which is not particularly excessive or alarming. [[Our problem filesystem ZFSDeleteQueueNLMLeak]] had over ten thousand entries by the time we resolved the issue. PS: You can pretty much ignore the summary line with its pretty sizes; as we see here, they have very little to do with how many delete queue entries you have right now. [[A growing ZFS delete queue size may be a problem indicator http://lists.open-zfs.org/pipermail/developer/2014-October/000887.html]], but here the only important thing in the summary is the _type_ field, which confirms that we have the right sort of objects both for the ZFS master node and the ZFS delete queue. PPS: You can also do this exercise for snapshots of filesystems; just use the full snapshot name instead of the filesystem. (I'm not going to try to cover _zdb_ usage details at all, partly because I'm just flailing around with it. See Ben Rockwood's [[zdb: Examining ZFS At Point-Blank Range http://www.cuddletech.com/blog/pivot/entry.php?id=980]] for one source of more information.)