2017-02-22
ZFS bookmarks and what they're good for
Regular old fashioned ZFS has filesystems and snapshots. Recent
versions of ZFS add a third object, called bookmarks. Bookmarks
are described like this in the zfs
manpage (for the 'zfs bookmark
'
command):
Creates a bookmark of the given snapshot. Bookmarks mark the point in time when the snapshot was created, and can be used as the incremental source for a
zfs send
command.
ZFS on Linux has an additional explanation here:
A bookmark is like a snapshot, a read-only copy of a file system or volume. Bookmarks can be created extremely quickly, compared to snapshots, and they consume no additional space within the pool. Bookmarks can also have arbitrary names, much like snapshots.
Unlike snapshots, bookmarks can not be accessed through the filesystem in any way. From a storage standpoint a bookmark just provides a way to reference when a snapshot was created as a distinct object. [...]
The first question is why you would want bookmarks at all. Right
now bookmarks have one use, which is saving space on the source of
a stream of incremental backups. Suppose that you want to use zfs
send
and zfs receive
to periodically update a backup. At one level,
this is no problem:
zfs snapshot pool/fs@current zfs send -Ri previous pool/fs@current | ...
The problem with this is that you have to keep the previous
snapshot around on the source filesystem, pool/fs
. If space is
tight and there is enough data changing on pool/fs
, this can be
annoying; it means, for example, that if people delete some files
to free up space for other people, they actually haven't done so
because the space is being held down by that snapshot.
The purpose of bookmarks is to allow you to do these incremental
sends without consuming extra space on the source filesystem. Instead
of having to keep the previous
snapshot around, you instead make
a bookmark based on it, delete the snapshot, and then do the
incremental zfs send
using the bookmark:
zfs snapshot pool/fs@current zfs send -i #previous pool/fs@current | ...
This is apparently not quite as fast as using a snapshot, but if you're using bookmarks here it's because the space saving is worth it, possibly in combination with not having to worry about unpredictable fluctuations in how much space a snapshot is holding down as the amount of churn in the filesystem varies.
(We have a few filesystems that get frequent snapshots for fast recovery of user-deleted files, and we live in a certain amount of concern that someday, someone will dump a bunch of data on the filesystem, wait just long enough for a scheduled snapshot to happen, and then either move the data elsewhere or delete it. Sorting that one out to actually get the space back would require deleting at least some snapshots.)
Using bookmarks does require you to keep the previous
snapshot
on the destination (aka backup) filesystem, although the manpage
only tells you this by implication. I believe that this implies
that while you're receiving a new incremental, you may need extra
space over and above what the current
snapshot requires for space,
since you won't be able to delete previous
and recover its space
until the incremental receive finishes. The relevant bit from the
manpage is:
If an incremental stream is received, then the destination file system must already exist, and its most recent snapshot must match the incremental stream's source. [...]
This means that the destination filesystem must have a snapshot. This snapshot will and must match a bookmark made from it, since otherwise incremental send streams from bookmarks wouldn't work.
(In theory bookmarks could also be used to generate an imprecise
'zfs diff
' without having to keep the origin snapshot around. In
practice I doubt anyone is going to implement this, and why it's
necessarily imprecise requires an explanation of why and how bookmarks
work.)
Sometimes it can be hard to tell one cause of failure from another
I mentioned recently how a firmware update fixed a 3ware controller so that it worked. As it happens, my experiences with this machine nicely illustrates the idea that sometimes it can be hard to tell one failure from another, or to put it another way, when you have a failure it can be hard to tell what the actual cause is. So let me tell the story of trying to install this machine.
Like many places within universities, we don't have a lot of money, but we do have a large collection of old, used hardware. Rather than throw eg five year old hardware away because it's beyond its nominal service life, we instead keep around anything that's not actively broken (or at least that doesn't seem broken) and press it into use again in sufficiently low-priority situations. One of the things that we have as a result of this is an assorted collection of various sizes of SATA HDs. We've switched over to SSDs for most servers, but we don't really have enough money to use SSDs for everything, especially when we're reconditioning an inherited machine under unusual circumstances.
Or in other words, we have a big box of 250 GB Seagate SATA HDs that have been previously used somewhere (probably as SunFire X2x00 system disks), all of which had passed basic tests when they were put into the box some time ago. When I wanted a pair of system disks for this machine I turned to that box. Things did not go well from there.
One of the disks from the first pair had really slow IO problems, which of course manifested as a far too slow Ubuntu 16.04 install. After replacing the slow drive, the second install attempt ended with the original 'good' drive dropping off the controller entirely, apparently dead. The replacement for that drive turned out to also be excessively slow, which took me up to four 250 GB SATA drives, of which one might be good (and three slow failed attempts to bring up one of our Ubuntu 16.04 installs). At that point I gave up and used some SSDs that we had relatively strong confidence in, because I wasn't sure if our 250 GB SATA drives were terrible or if the machine was eating disks. The SSDs worked.
Before we did the 3ware firmware upgrade and it made other things work great, I would have confidently told you that our 250 GB SATA disks had started rotting and could no longer be trusted. Now, well, I'm not so sure. I'm perfectly willing to believe bad things about those old drives, but were my problems because of the drives, the 3ware controller's issues, or some combination of both? My guess now is on a combination of both, but I don't really know and that shows the problem nicely.
(It's not really worth finding out, either, since testing disks for slow performance is kind of a pain and we've already spent enough time on this issue. I did try the 'dead' disk in a USB disk docking station and it worked in light testing.)