Wandering Thoughts archives

2017-06-28

I don't think you should increase ZFS on Linux's write buffering

While looking at my Referer logs here one day, I would up stumbling over SvennD's Tuning of ZFS module. I have ambivalent feelings about its suggestions in general but there is one bit that I have a strong reaction to, and that is the suggestion to substantially increase zfs_dirty_data_max_percent. This setting controls how much asynchronous buffered writes ZFS will allow you to have before it forces processes doing writes to slow down and stop.

To start with, write buffering is complicated in general and it's not clear that having any substantial amount of it helps you outside of very specific workloads and relatively specific disk systems. The corollary is that it's pretty hard to give generic write buffering tuning advice unless the default settings are somehow clearly inadequate or wrong, and if you believe they are you should probably write up why.

On Linux specifically, there is at least some evidence that giving the kernel too much buffered writes has bad effects, and further that the kernel's default settings are too high, not too low. It's not clear how the kernel's general dirty_ratio setting interacts with ZFS's zfs_dirty_data_max_percent, but dirty_ratio defaults to 20%. If 20% is too high for non-ZFS IO, and ZFS is controlled only by its own setting, moving that setting from 10% to 40% is probably not what you want. Things get worse if the two settings are additive, so that the general kernel will give you 20% and then ZFS will give you an additional 10% on top of it. Even if they're separate, you may have problems if you have active ZFS and non-ZFS filesystems on the same machine, since then ZFS is taking 10% and extN is taking 20%.

(Given this, I should probably permanently turn my dirty_ratio down to 10% at most, and reduce dirty_background_ratio as well, since I have a mix of ZFS and non-ZFS filesystems on my ZoL machines and I've had problems in this area before, although they got fixed.)

(Some experimentation suggests that writes to ZFS filesystems don't change nr_dirty and nr_writeback in /proc/vmstat, which may be an indication that the kernel's general dirty_ratio et al don't apply to ZFS IO and the two settings are completely separate. Unfortunately on a casual look I can't spot any ZFS kstats for how much pending writes there are.)

Next is that tuning ZFS write behavior is complicated in general because ZFS has a significant number of controls and they interact with each other in somewhat complicated ways. ZFS on Linux has some discussion of this in the zfs-module-parameters manpage, complete with ASCII art diagrams. zfs_dirty_data_max_percent is only one part of ZFS's write tuning; if you change it, you may well want to adjust other parts as well.

(Tuning of ZFS module suggests changing some additional write parameters, but it doesn't discuss how they're related to each other and I think it's wrong about zfs_vdev_async_write_min_active because of what it means.)

Finally, there is the issue that a traditional weak area of ZFS on Linux has been its management of memory not being entirely well integrated with the general kernel memory management. Although things have gotten better here, I'm still not sure it's a good idea to let ZoL potentially hold a substantial amount of memory for buffered writes (especially since I'm not sure they count against the ARC size). This is definitely an area that I would want to experiment and be cautious about, especially on a machine that was doing anything else that wanted memory.

ZFSOnLinuxWriteBuffering written at 01:17:33; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.