Storing ZFS send streams is not a good backup method

May 9, 2021

One of the eternally popular ideas for people using ZFS is doing backups by using 'zfs send' and storing the resulting send streams. Although appealing, this idea is a mistake, because ZFS send streams do not have the properties you want for a backup format.

A good backup format is designed for availability. No matter what happens, it should let you extract as much from it as possible, from both full backups and incremental backups. If your backup stream is damaged, you should still be able to find and restore as much as possible, both before and after the damage. If a full backup is missing or destroyed, you should still be able to recover something from whatever incrementals you have. This requires incremental backups to have more information in them than they specifically need, but that's a tradeoff you make for availability.

A better backup format should also be convenient to operate, and one big aspect of this is selective restores. A lot of the time you don't need to restore absolutely everything, you just want to get back one file or some files that you need because they got removed, damaged, or whatever. If you have to a complete restore (both full and incremental) in order get back a single file, you don't have a convenient backup format. Other nice things are, for example, being able to readily get an index of what is captured in any particular backup stream (full or incremental).

Incremental ZFS send streams do not have any of these properties and full ZFS send streams only have a few of them. Neither full nor incremental streams have any resilience against damage to the stream; a stream is either entirely intact or it's useless. Neither has selective restores or readily available indexes. Incremental streams are completely useless without everything they're based on. All of these issues will sooner or later cause you pain if you use ZFS streams as a backup format.

ZFS send streams are great at what they're for, which is replicating ZFS filesystems from one ZFS pool to another in an environment where you can immediately deal with any problems that come up (whether by retrying the send of a corrupted stream, changing what it's based on, or whatever you need to do). The further you pull 'zfs send' away from this happy path, the more problems you're going to have.

(The design decisions of ZFS send streams make a great deal of sense for this purpose. As a replication format they're designed to be easy to generate, easy to receive, and compact, especially for incremental send streams. They have no internal redundancy or recovery from corruption because the best recovery is 'resend the stream to get a completely good one'.)

(This comes up on the ZFS on Linux mailing list periodically and I write replies (eg, also), so it's time to write this down in an entry.)

Written on 09 May 2021.
« It's pleasantly easy to install PyPy yourself (from their binaries)
DKMS built one of my kernel modules for the wrong kernel »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun May 9 00:01:12 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.