What zpool list's new FRAG fragmentation percentage means

December 2, 2015

Recent versions of 'zpool list' on Illumos (and elsewhere) have added a new field of information called 'FRAG', reported as a percentage, which the zpool manpage will tell you is 'the amount of fragmentation in the pool'. To put it politely, this is very under-documented (and in a misleading way). Based on an expedition into the current Illumos kernel code, as far as I can tell:

zpool list's FRAG value is an abstract measure of how fragmented the free space in the pool is.

A pool with a low FRAG percent has most of its remaining free space in large contiguous segments, while a pool with a high FRAG percentage has most of its free space broken up into small pieces. The FRAG percentage tells you nothing about how fragmented (or not fragmented) your data is, and thus how many seeks it will take to read it back. Instead it is part of how hard ZFS will have to work to find space for large chunks of new data (and how fragmented they may be forced to be when they get written out).

(How hard ZFS has to work to find space is also influenced by how much total free space is left in your pool. There's likely to be some correlation between low free space and higher FRAG numbers, but I wouldn't assume that they're inextricably yoked together.)

FRAG also doesn't tell you how evenly the free space is distributed across your disk(s). As far as I know, adding a new vdev or expanding an existing one will generally result in the new space being seen as essentially unfragmented; this can drop your overall FRAG percent even if your old disk space had very fragmented free space. In practice this probably doesn't matter, since ZFS will generally prefer to write things to that new (and unfragmented) space.

(Such a drop in FRAG is 'fair' in the sense that the chances that ZFS will be able to find a large chunk of free space have gone way up.)

How the percentages relate to the average segment size of free space goes roughly like this. Based on the current Illumos kernel code, if all free space was in segments of the given size, the reported fragmentation would be:

  • 512 B and 1 KB segments are 100% fragmented
  • 2 KB segments are 98% fragmented; 4 KB segments are 95% fragmented.
  • 8 KB to 1 MB segments start out at 90% fragmented and drop 10% for every power of two (eg 16 KB is 80% fragmented and 1 MB is 20%). 128 KB segments are 50% fragmented.
  • 2 MB, 4 MB, and 8MB segments are 15%, 10%, and 5% fragmented respectively
  • 16 MB and larger segments are 0% fragmented.

Of course the free space is probably not all in segments of one size. ZFS does the obvious thing and weights each segment size bucket by the amount of free space that falls into that range. This makes FRAG essentially an average, which means it has the usual hazards of averages.

Note that these fragmentation percents are relatively arbitrary, as comments in the Illumos kernel code admit; they are designed to produce what the ZFS developers feel is a useful result, not by following any strict mathematical formula. They may also change in the future. As far as relative values go, according to comments in the source code, 'a 10% change in fragmentation equates to approximately double the number of segments'.

(The source code explicitly calls the fragmentation percentage a 'metric' as opposed to a direct measurement.)

I believe that one interesting consequence of the current OmniOS code is that a pool on 4K sector disks (a pool with ashift=12) can never be reported as more than 95% fragmented, because 4K is the minimum allocation size and thus the minimum free segment size. I would not be surprised if in the future ZFS modifies the fragmentation percents reported for such pools so that 4K segments become '100% fragmented'.

(Technically it would be a per-vdev thing, but in practice I think that very few people mix vdevs with different ashifts and block sizes.)

I was initially planning on writing up the technical details too, but this entry is already long enough as it is so I'm deferring them to another entry.

Written on 02 December 2015.
« Red Hat has really doubled down on being email spammers
One limit to how much TLS can do for you against MITM attacks »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Dec 2 22:46:21 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.