== An important addition to how ZFS deduplication works on the disk My entry on [[how ZFS deduplication works on the disk ZFSDedupStorage]] turns out to have missed one important aspect of how deduplication affects the on-disk ZFS data. Armed with this information we can finally answer [[some long-standing uncertainties about ZFS deduplication ZFSDedupBadDocumentation]]. As I mentioned in passing [[earlier ZFSDedupStorage]], ZFS uses *block pointers* to describe where the actual data for blocks are. Block pointers have the *data virtual addresses* of up to three copies of the block's data, the block's checksum, and a number of other bits and pieces. Crucially, block pointers are specially marked if they were written with deduplication on. It is the deduplication flag in any particular block pointer that controls what happens when the block pointer is deleted. If the flag is on, the delete does a DDT lookup so that the reference counts can be maintained; if the flag is off, there's no DDT lookup needed. (When the reference count of a DDT entry goes to zero, the DDT entry itself gets deleted. A ZFS pool always has DDT tables, even if they're empty.) As mentioned in [[the first entry ZFSDedupStorage]], deduplication has basically no effects on reads because reads of a dedup'd BP don't normally involve the DDT since the BP contains the DVAs of some copies of the block and ZFS will just read directly from these. However if there is a read error on a dedup'd BP, ZFS does a DDT lookup to see if there's another copy of the block available (for example in the 'ditto' copies). (I'm waving my hands about deduplication's potential effects on how fragmented a file's data gets on the disk.) Only file data is deduplicated. ZFS metadata like directories is not subject to deduplication and so block pointers for metadata blocks will never be dedup'd BPs. This is pretty much what you'd expect but I feel like mentioning it explicitly since I just checked this in the code. So turning ZFS deduplication on does not irreversibly taint anything as far as I can see. Any data written while deduplication is on will be marked as a dedup'd BP and then when it's deleted you'll hit the DDT, but after deduplication is turned off and all of that data is deleted the DDT should be empty again. And if you never delete any of the data the only effect is that the DDT will sit there taking up some extra space. But you *will* take [[the potential deduplication hit ZFSDedupMemoryProblem]] when you delete data written while deduplication is on, even if you later turn it off, and [[this includes deleting snapshots ZFSDedupMemoryProblem]]. === Sidebar: Deduplication and ZFS scrubs As you'd expect, ZFS scrubs and resilvers do check and correct DDT entries, and they check all DVAs that DDT entries point to (even ditto blocks, which are not directly referred to by any normal data BPs). The scanning code tries to do DDT and file data checks efficiently, basically checking DDT entries and the DVAs they point to once no matter how many references they have. The exact mechanisms are a little bit complicated. (My paranoid instincts see corner cases with this code, but I'm probably wrong. And if they happened they would probably be the result of ZFS code bugs, not disk IO errors.)