'Deduplicated' ZFS send streams are now deprecated and on the way out

April 10, 2020

For a fair while, 'zfs send' has had support for a -D argument, aka --dedup, that causes it to send what is called a 'deduplicated stream'. The zfs(1) manpage describes this as:

Generate a deduplicated stream. Blocks which would have been sent multiple times in the send stream will only be sent once. The receiving system must also support this feature to receive a deduplicated stream. This flag can be used regardless of the dataset's dedup property, but performance will be much better if the filesystem uses a dedup-capable checksum (for example, sha256).

This feature is now on the way out in the OpenZFS repository. It was removed in a commit on March 18th, and the commit message explains the situation:

Dedup send can only deduplicate over the set of blocks in the send command being invoked, and it does not take advantage of the dedup table to do so. This is a very common misconception among not only users, but developers, and makes the feature seem more useful than it is. As a result, many users are using the feature but not getting any benefit from it.

Dedup send requires a nontrivial expenditure of memory and CPU to operate, especially if the dataset(s) being sent is (are) not already using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when developing new features, causing bugs in released code, and delaying development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving of such streams. This change adds a warning to the man page, and also prints the warning whenever dedup send or receive are used.

I actually had the reverse misconception about how deduplicated sends worked; I assumed that they required deduplication to be on in the filesystem itself. Since we will never use deduplication, I never looked any further at the 'zfs send' feature. It probably wouldn't have been a net win for us anyway, since our OmniOS fileservers didn't have all that fast CPUs and we definitely weren't using one of the dedup-strength checksums.

(Our current Linux fileservers have better CPUs, but I think they're still not all that impressive.)

The ZFS people are planning various features to deal with the removal of this feature so that people will still be able to use saved deduplicated send streams. However, if you have such streams in your backup systems, you should probably think about aging them out. And definitely you should move away from generating new ones, even though this change is not yet in any release of ZFS as far as I know (on any platform).

Written on 10 April 2020.
« Why my commit messages for configuration files describe my changes
ZFS on Linux has now become the OpenZFS ZFS implementation »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Apr 10 22:58:33 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.