An update to the ZFS excessive prefetching situation
A while back I wrote about how I had discovered that ZFS could wind up doing excessive readahead when faced with many streams of sequential read IO and wind up throwing 90% to 95% of the IO that it had done (with terrible consequences for application performance). It's time for an update on that situation.
First, for various reasons we wound up moving to Solaris server machines with 8 GB of memory (SunFire X2200s instead of X2100s), so I re-enabled ZFS file prefetching and re-ran my experiments. Initial testing was encouraging; with 8 GB, the ZFS ARC cache was big enough that even under my heavy test load ZFS could keep prefetched data around for long enough to not kill application level performance.
Well. Usually big enough, but sometimes the ZFS ARC would spontaneously
decide to limit itself down to 2 GB (instead of the usual 5 to 7
GB), despite the test machines being otherwise unused and idle. This
destroyed performance, and worse I could find no way of resetting the
adaptive ARC target size (what you see as
c in the output of '
-m zfs') to recover from the situation. So we turned off ZFS file
prefetching again and there things sat for a while.
Recently I discovered the under-documented
zfs_arc_min ZFS tuning
parameter, which sets the minimum ZFS ARC size (it is the mirror of
the better documented
zfs_arc_max tuning parameter for setting
the maximum size). Since a large minimum size should prevent the
catastrophic ARC shrinkage, our test systems now have it set to 5 GB
and it seems to be working so far (in that the ARC hasn't shrunk on
either of them).
(On dedicated NFS servers, I am pretty sure that we actively want most of the memory to be reserved for ZFS caches. Nothing that is particularly memory-consuming should ever run on them, and if it does, I would prefer that it swap itself to death rather than impacting NFS server performance.)
Update, October 22nd: see an important update. I can no longer recommend that you do this.