'Deduplicated' ZFS send streams are now deprecated and on the way out
For a fair while, '
zfs send' has had support for a -D argument, aka
--dedup, that causes it to send what is called a 'deduplicated stream'.
zfs(1) manpage describes this
Generate a deduplicated stream. Blocks which would have been sent multiple times in the send stream will only be sent once. The receiving system must also support this feature to receive a deduplicated stream. This flag can be used regardless of the dataset's
dedupproperty, but performance will be much better if the filesystem uses a dedup-capable checksum (for example,
Dedup send can only deduplicate over the set of blocks in the send command being invoked, and it does not take advantage of the dedup table to do so. This is a very common misconception among not only users, but developers, and makes the feature seem more useful than it is. As a result, many users are using the feature but not getting any benefit from it.
Dedup send requires a nontrivial expenditure of memory and CPU to operate, especially if the dataset(s) being sent is (are) not already using a dedup-strength checksum.
Dedup send adds developer burden. It expands the test matrix when developing new features, causing bugs in released code, and delaying development efforts by forcing more testing to be done.
As a result, we are deprecating the use of `zfs send -D` and receiving of such streams. This change adds a warning to the man page, and also prints the warning whenever dedup send or receive are used.
I actually had the reverse misconception about how deduplicated
sends worked; I assumed that they required deduplication to be on
in the filesystem itself. Since we will never use deduplication, I
never looked any further at the '
zfs send' feature. It probably
wouldn't have been a net win for us anyway, since our OmniOS
fileservers didn't have all that fast CPUs
and we definitely weren't using one of the dedup-strength checksums.
(Our current Linux fileservers have better CPUs, but I think they're still not all that impressive.)
The ZFS people are planning various features to deal with the removal of this feature so that people will still be able to use saved deduplicated send streams. However, if you have such streams in your backup systems, you should probably think about aging them out. And definitely you should move away from generating new ones, even though this change is not yet in any release of ZFS as far as I know (on any platform).
Why my commit messages for configuration files describe my changes
Over the years, I have wound up adopting a particular and somewhat
unusual style of commit message for many of my changes to system
/etc/group, to things like DNS and DHCP control files,
and to configuration files. The unusual thing I do is that in my
commit message I don't just say why the change is being made, I say
what the change itself is (in the abstract). For instance, for a
change to our
/etc/group, I might say "added <x>, <y>, and <z>
to group 'fred'" (with the <>'s as part of the text, because '<cks>'
is our local style for writing out logins).
On the surface, this is strange. What I changed is right there in the diff itself; putting it in the commit message appears redundant and feels somewhat like putting a '// add x and y together' comment in code. However, this is not quite true. The diff is what I did change, while the commit message is what I intended to change. When all goes well, the two are the same. But things don't always go well, and when that happens having an explicit description of the intent can be important.
Of course, programmers can have this problem too. But as a a sysadmin and sometimes programmer, I've wound up feeling that sysadmins are both more prone to this problem and better placed to be able to deal with it with commit messages. On the bad side, many more mistakes with the files we deal with are perfectly valid and functional results, just not what we intended. And generally we don't have the sort of tests that programmers do, which would catch some of these mistakes. On the good side, many of our changes are small enough that what we intended to do can be described in high detail in a short commit message, in a way that's not the case for many code changes.
(Generally, our intentions will also appear in our worklog system. But having them in the commit message saves finding the relevant worklog, and since I generally commit right after looking at a diff (and with it still on the screen), writing out what the diff should show may help me actively notice an error.)
PS: It doesn't help that many control and configuration files are rather
less readable than well formatted code is, and often give you diffs
where what actually changed is harder to see than in most code changes.
If you're just adding a login or two to a group, a diff of
has a lot of noise that can make it hard to see the important signal.