2017-01-13
The ZFS pool history log that's used by 'zpool history
' has a size limit
I have an awkward confession. Until Aneurin Price mentioned it in
his comment on my entry on 'zpool history -i
',
I had no idea that the internal, per-pool history log that zpool
history
uses has a size limit. I thought that perhaps the size and
volume of events was small enough that ZFS just kept everything,
which is silly in retrospect. This unfortunately means that the
long-term 'strategic' use of zpool history
that I talked about
in my first entry has potentially significant
limits, because you can only go back so far in history. How far
depends on a number of factors, including how many snapshots and
so on you take.
(If you're just inspecting the output of 'zpool history
', it's
easy to overlook that it's gotten truncated, because it always
starts with the pool's creation. This is because the ZFS code that
maintains the log goes out of its way to make sure that the initial
pool creation record is kept forever.)
The ZFS code that creates and maintains the log is in spa_history.c.
As far as the log's size goes, let me quote the comment in
spa_history_create_obj
:
/* * Figure out maximum size of history log. We set it at * 0.1% of pool size, with a max of 1G and min of 128KB. */
Now, there is a complication, which is that the pool history log is only sized and set up once, at initial pool creation. So that size is not 0.1% of the current pool size, it is 0.1% of the initial pool size, whatever that was. If your pool has been expanded since its creation and started out smaller than 1000 GB, its history log is smaller (possibly much smaller) than it would be if you recreated the pool at 1000 GB or more now. Unfortunately, based on the code, I don't think ZFS can easily resize the history log after creation (and it certainly doesn't attempt to now).
The ZFS code does maintain some information about how many records
have been lost and how many total bytes have been written to the
log, but these don't seem to be exposed in any way to user-level
code; they're simply there in the on-disk and in-memory data
structures. You'd have to dig them out of the depths of the kernel
with DTrace or the like, or you can use zdb
to read them off disk.
(It turns out that our most actively snapshotted pool, which probably has the most records in its log, only has an 11% full history log at the moment.)
Sidebar: Using zdb
to see history log information
This is brief notes, in the style of using zdb
to see the ZFS
delete queue. First we need to find out the object
ID of the SPA history information, which is always going to be in
the pool's root dataset (as far as I know):
# zdb -dddd rpool 1 Dataset mos [META], [...] Object lvl iblk dblk dsize lsize %full type 1 1 16K 16K 24.0K 32K 100.00 object directory [...] history = 32 [...]
The history log is stored in a ZFS object; here that is object number 32. Since it was object 32 in three pools that I checked, it may almost always be that.
# zdb -dddd rpool 32 Dataset [...] Object lvl iblk dblk dsize lsize %full type 32 1 16K 128K 36.0K 128K 100.00 SPA history 40 bonus SPA history offsets dnode flags: USED_BYTES dnode maxblkid: 0 pool_create_len = 536 phys_max_off = 79993765 bof = 536 eof = 77080 records_lost = 0
The bof
and eof
values are logical byte positions in the ring
buffer, and so at least eof
will be larger than phys_max_off
if you've started losing records. For more details, see the comments
in spa_history.c.