2017-01-17
Making my machine stay responsive when writing to USB drives
Yesterday I talked about how writing things to USB drives made my machine not very responsive, and in a comment Nolan pointed me to LWN's The pernicious USB-stick stall problem. According to LWN's article, the core problem is an excess accumulation of dirty write buffers, and they give some VM system sysctls that you can use to control this.
I was dubious that this was my problem, for two reasons. First, I
have a 16 GB machine and I barely use all that memory, so I thought
that allowing a process to grab a bit over 3 GB of them for dirty
buffers wouldn't make much of a difference. Second, I had actually
been running sync
frequently (in a shell loop) during the entire
process, because I have sometimes had it make a difference in these
situations; I figured frequent sync
s should limit the amount of
dirty buffers accumulating in general. But I figured it couldn't
hurt to try, so I used the dirty_background_bytes
and dirty_bytes
settings to limit this to 256 MB and 512 MB respectively and tested
things again.
It turns out that I was wrong. With these sysctls turned down, my machine stayed quite responsive for once, despite me doing various things to the USB flash drive (including things that had had a terrible effect just yesterday). I don't entirely understand why, though, which makes me feel as if I'm doing fragile magic instead of system tuning. I also don't know if setting these down is going to have a performance impact on other things that I do with my machine; intuitively I'd generally expect not, but clearly my intuition is suspect here.
(Per this Bob Plankers article,
you can monitor the live state of your system with egrep
'dirty|writeback' /proc/vmstat
. This will tell you the number of
currently dirty pages and the thresholds (in pages, not bytes). I
believe that nr_writeback
is the number of pages actively being
flushed out at the moment, so you can also monitor that.)
PS: In a system with drives (and filesystems) of vastly different speeds, a global dirty limit or ratio is a crude tool. But it's the best we seem to have on Linux today, as far as I know.
(In theory, modern cgroups support
the ability to have per-cgroup dirty_bytes
settings, which would
let you add extra limits to processes that you knew were going to
do IO to slow devices. In practice this is only supported on a few
filesystems and isn't exposed (as far as I know) through systemd's
cgroups mechanisms.)