Wandering Thoughts archives

2017-01-13

The ZFS pool history log that's used by 'zpool history' has a size limit

I have an awkward confession. Until Aneurin Price mentioned it in his comment on my entry on 'zpool history -i', I had no idea that the internal, per-pool history log that zpool history uses has a size limit. I thought that perhaps the size and volume of events was small enough that ZFS just kept everything, which is silly in retrospect. This unfortunately means that the long-term 'strategic' use of zpool history that I talked about in my first entry has potentially significant limits, because you can only go back so far in history. How far depends on a number of factors, including how many snapshots and so on you take.

(If you're just inspecting the output of 'zpool history', it's easy to overlook that it's gotten truncated, because it always starts with the pool's creation. This is because the ZFS code that maintains the log goes out of its way to make sure that the initial pool creation record is kept forever.)

The ZFS code that creates and maintains the log is in spa_history.c. As far as the log's size goes, let me quote the comment in spa_history_create_obj:

/*
 * Figure out maximum size of history log.  We set it at
 * 0.1% of pool size, with a max of 1G and min of 128KB.
 */

Now, there is a complication, which is that the pool history log is only sized and set up once, at initial pool creation. So that size is not 0.1% of the current pool size, it is 0.1% of the initial pool size, whatever that was. If your pool has been expanded since its creation and started out smaller than 1000 GB, its history log is smaller (possibly much smaller) than it would be if you recreated the pool at 1000 GB or more now. Unfortunately, based on the code, I don't think ZFS can easily resize the history log after creation (and it certainly doesn't attempt to now).

The ZFS code does maintain some information about how many records have been lost and how many total bytes have been written to the log, but these don't seem to be exposed in any way to user-level code; they're simply there in the on-disk and in-memory data structures. You'd have to dig them out of the depths of the kernel with DTrace or the like, or you can use zdb to read them off disk.

(It turns out that our most actively snapshotted pool, which probably has the most records in its log, only has an 11% full history log at the moment.)

Sidebar: Using zdb to see history log information

This is brief notes, in the style of using zdb to see the ZFS delete queue. First we need to find out the object ID of the SPA history information, which is always going to be in the pool's root dataset (as far as I know):

# zdb -dddd rpool 1
Dataset mos [META], [...]

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         1    1    16K    16K  24.0K    32K  100.00  object directory
[...]
               history = 32 
[...]

The history log is stored in a ZFS object; here that is object number 32. Since it was object 32 in three pools that I checked, it may almost always be that.

# zdb -dddd rpool 32
Dataset [...]
    Object  lvl   iblk   dblk  dsize  lsize   %full  type
        32    1    16K   128K  36.0K   128K  100.00  SPA history
                                         40   bonus  SPA history offsets
        dnode flags: USED_BYTES 
        dnode maxblkid: 0
                pool_create_len = 536
                phys_max_off = 79993765
                bof = 536
                eof = 77080
                records_lost = 0

The bof and eof values are logical byte positions in the ring buffer, and so at least eof will be larger than phys_max_off if you've started losing records. For more details, see the comments in spa_history.c.

solaris/ZFSZpoolHistorySizeLimit written at 01:28:05; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.