I don't think you should increase ZFS on Linux's write buffering
While looking at my Referer logs here one day, I would
up stumbling over SvennD's Tuning of ZFS module. I have ambivalent
feelings about its suggestions in general but there is one bit that
I have a strong reaction to, and that is the suggestion to substantially
increase zfs_dirty_data_max_percent
. This setting controls how
much asynchronous buffered writes ZFS will allow you to have before
it forces processes doing writes to slow down and stop.
To start with, write buffering is complicated in general and it's not clear that having any substantial amount of it helps you outside of very specific workloads and relatively specific disk systems. The corollary is that it's pretty hard to give generic write buffering tuning advice unless the default settings are somehow clearly inadequate or wrong, and if you believe they are you should probably write up why.
On Linux specifically, there is at least some evidence that giving
the kernel too much buffered writes has bad effects, and further that the kernel's default
settings are too high, not too low. It's not clear how the kernel's
general dirty_ratio
setting interacts with ZFS's
zfs_dirty_data_max_percent
, but dirty_ratio
defaults to
20%. If 20% is too high for non-ZFS IO, and ZFS is controlled only
by its own setting, moving that setting from 10% to 40% is probably
not what you want.
Things get worse if the two settings are additive, so that the
general kernel will give you 20% and then ZFS will give you an
additional 10% on top of it. Even if they're separate, you may have
problems if you have active ZFS and non-ZFS filesystems on the same
machine, since then ZFS is taking 10% and extN is taking 20%.
(Given this, I should probably permanently turn my dirty_ratio
down to 10% at most, and reduce dirty_background_ratio
as well,
since I have a mix of ZFS and non-ZFS filesystems on my ZoL machines
and I've had problems in this area before,
although they got fixed.)
(Some experimentation suggests that writes to ZFS filesystems don't
change nr_dirty
and nr_writeback
in /proc/vmstat
, which
may be an indication that the kernel's general dirty_ratio
et
al don't apply to ZFS IO and the two settings are completely separate.
Unfortunately on a casual look I can't spot any ZFS kstats for how
much pending writes there are.)
Next is that tuning ZFS write behavior is complicated in general
because ZFS has a significant number of controls and they interact
with each other in somewhat complicated ways. ZFS on Linux has some
discussion of this in the zfs-module-parameters
manpage,
complete with ASCII art diagrams. zfs_dirty_data_max_percent
is only one part of ZFS's write tuning; if you change it, you may
well want to adjust other parts as well.
(Tuning of ZFS module suggests changing some additional write
parameters, but it doesn't discuss how they're related to each other
and I think it's wrong about zfs_vdev_async_write_min_active
because of what it means.)
Finally, there is the issue that a traditional weak area of ZFS on Linux has been its management of memory not being entirely well integrated with the general kernel memory management. Although things have gotten better here, I'm still not sure it's a good idea to let ZoL potentially hold a substantial amount of memory for buffered writes (especially since I'm not sure they count against the ARC size). This is definitely an area that I would want to experiment and be cautious about, especially on a machine that was doing anything else that wanted memory.
|
|