Chris's Wiki :: blog/linux/FixingUSBDriveResponsiveness Commentshttps://utcc.utoronto.ca/~cks/space/blog/linux/FixingUSBDriveResponsiveness?atomcommentsDWiki2019-11-21T09:49:30ZRecent comments in Chris's Wiki :: blog/linux/FixingUSBDriveResponsiveness.By jrudolph on /blog/linux/FixingUSBDriveResponsivenesstag:CSpace:blog/linux/FixingUSBDriveResponsiveness:20051128e1b1f062f2269e51d97d8f99d40cadd8jrudolphhttp://blog.virtual-void.net<div class="wikitext"><p>I actually found your blog back in 2016 when we were observing something similar in a scenario where the main job of our application was copying large amounts of data between devices. (Not sure which article I found back then, but I think you had some older ones about dirty_ratio and similar from before 2016)</p>
<p>On some servers, we had some fast NFS-mounted storage and some slower USB-attached drives to copy data from. In that scenario, we also saw systems almost grinding to a halt. When it happened we generated system-wide kernel stack traces (maybe using `echo l > /proc/sysrq-trigger`?). What we saw was that once you've got to the point where there was an excess of dirty pages, waiting for those being flushed could get into basically any call path that dealt with allocating pages. Unfortunately, I don't have the stack traces around any more, but iirc it was basically "you want to allocate a page? Too many pages are dirty, lets clean them up before handing out a new one. Ah, someone is already flushing, let's wait for that to finish". That would explain why almost everything can be affected even if a process is not really accessing the slow device or writing to disk at all.</p>
</div>2019-11-21T09:49:30ZBy Chris Siebenmann on /blog/linux/FixingUSBDriveResponsivenesstag:CSpace:blog/linux/FixingUSBDriveResponsiveness:5bc8b916667b4fff414d8608e7b438c474363e53Chris Siebenmann<div class="wikitext"><p>I'm afraid that I don't know. I stuck sysctl settings for this into
/etc/sysctl.d back in 2017 and on top of that I don't write to USB
stuff very much any more, so I don't have any recent experience with
how this goes.</p>
</div>2019-06-26T21:35:54ZBy Anon on /blog/linux/FixingUSBDriveResponsivenesstag:CSpace:blog/linux/FixingUSBDriveResponsiveness:242c3080371dffc37e37b2d1421f79c054400bfaAnon<div class="wikitext"><p>Chris: Did this issue ever get any better "out of the box" (i.e. without hand tuning) with more recent kernels (e.g. see <a href="https://unix.stackexchange.com/questions/526124/what-are-the-outstanding-problems-stalls-which-might-be-mitigated-by-limiti">https://unix.stackexchange.com/questions/526124/what-are-the-outstanding-problems-stalls-which-might-be-mitigated-by-limiti</a> )?</p>
</div>2019-06-25T12:45:38ZBy Anon on /blog/linux/FixingUSBDriveResponsivenesstag:CSpace:blog/linux/FixingUSBDriveResponsiveness:64cff8f5bb546c4f5b255c4e7bbcaab169e53b3eAnon<div class="wikitext"><p>While there are per device queues for the IOs below the block layer each of these individual queues does not extended all the way to the program doing buffered writes.</p>
<p>The buffered writes queue in memory and only through writeback do they make their way to disk. In this case there's a shared initial queue (dirty pages) that later feeds other queues so if you overwhelm any intermediary queue everyone who has to pass I/O through them is punished too. If one of these queues can't service any more writes, then writes for any device can't be serviced until they are are less full. What's worse, reads can become trapped behind the writes too if a sync takes place because a flush has to follow all those writes before the new read is allowed through. Another problem is that the program writing to the USB disk can probably dirty the page cache faster than anything else (it's not reading anything so why does it have to slow down?) so soon many queues will be clogged with its data the moment there's a space that no one else needs... yet. Once data destined for the USB disk gets into one of these queues that queue will drain slowly because the underlying device is slow applying back pressure. You need a way to quickly apply the pressure all the way back to the source so as to throttle the speed the initial program dirties pages and allow everything else a look in...</p>
<p>A smaller writeback helps because it means that no program can dirty many pages before being throttled which in turn means that it can't fill later queues so deeply without others getting a look in and said queues will drain faster (at a cost of throughput) as there is less in flight at any given time. Ultimately if the queues are shallow and a program is writing to a slow device then the program will be throttled sooner thanks to back pressure than if the queues are deep.</p>
<p>Any more convincing?</p>
</div>2017-01-17T20:38:14ZBy Guus Snijders on /blog/linux/FixingUSBDriveResponsivenesstag:CSpace:blog/linux/FixingUSBDriveResponsiveness:85388d4c46af6cb0fcb2148529ac2bc1a5c8b873Guus Snijders<div class="wikitext"><blockquote><p>What puzzles me about my situation is that there are completely
different devices involved. The only thing that was doing IO to
the USB flash drive was an rsync, yet everything lurched to a pause,
even though at most they wanted to do IO to drives that were only
being read from by rsync.</p>
</blockquote>
<p>After reading your entry (and the comments), I got curious and did some reading (luckily, I had a slow day at $work ;)).</p>
<p>As I understand it, there is a global limit on dirty memory, once that limit is reached, the kernel has to drain some of it. As I understand it, the really nasty bit comes from sync()'s.
The kernel /has/ to write all of it's dirty memory, including what is destined for the <em>slow</em> USB stick. </p>
<p>From that point on, it's a simple calculation; the bigger the buffers, the longer it takes to write out. Not much of a problem for fast HDD/SSD storage, but when you include slow storage in the mix...</p>
<blockquote><p>If writeback is global, I'm not sure I understand why a smaller
dirty pool helps</p>
</blockquote>
<p>AFAIK: smaller dirty pool means less data to write, meaning less time waiting on the device to complete. It looks like the (optional) per-device limits are a fairly recent addition. Looks like a nice target for a UDEV rule.</p>
<p>Why the whole system slows down; well, you're much better qualified to explain than me ;).</p>
</div>2017-01-17T20:35:33ZBy Chris Siebenmann on /blog/linux/FixingUSBDriveResponsivenesstag:CSpace:blog/linux/FixingUSBDriveResponsiveness:5971b90e459d5c6ccc9944055575a73190963491Chris Siebenmann<div class="wikitext"><p>What puzzles me about my situation is that there are completely
different devices involved. The only thing that was doing IO to the USB
flash drive was an <code>rsync</code>, yet everything lurched to a pause, even
though at most they wanted to do IO to drives that were only being read
from by <code>rsync</code>. The only things I can think of is either there was way
more memory eviction happening than I expected or all of the pending
writes got merged into one big pool across all of the devices, and
the writes to the USB flash drive caused other writes to stall or be
(significantly) delayed.</p>
<p>Descriptions of dirty write buffering and ways to deal with it that
I've read (<a href="https://lwn.net/Articles/682582/">eg</a>) generally seem to
talk about it in per-device terms. Per-device limits and operation are
clearly what you want in general; if I'm doing IO to device A and you
flood completely unrelated device B, I shouldn't be affected by your
activity. But I'm now unclear if the Linux kernel actually operates
this way.</p>
<p>(If writeback is global, I'm not sure I understand why a smaller dirty
pool helps. My USB flash drive writes at roughly 10 MB/sec, so even a 512
MB pool is going to rapidly fill up and force everyone to throttle on
writebacks, even with background writes starting at 256 MB of dirty
data.)</p>
</div>2017-01-17T17:03:32ZBy Anon on /blog/linux/FixingUSBDriveResponsivenesstag:CSpace:blog/linux/FixingUSBDriveResponsiveness:4b37b607d39d5dba39af964202c118c556d9cde1Anon<div class="wikitext"><p>It's something of a bufferbloat situation where reads get trapped behind slow writes.</p>
<p>You start doing a large amount of writes to a USB device. The USB device can't keep up with the writes being done to it. With normal I/O the you do your write and if there's buffer space the kernel tells the program "got your writes" immediately (hence the need to fsync if you care about whether the data really is on disk) which just makes the application send more quickly. The Linux kernel continues to buffer the writes up to some limit at which point the write call will block in the original application. A sync comes along that says "you must flush all writes everywhere now". At that point new reads (which can't be deferred and will make a program waiting on them hang) aren't allowed to be reordered past any writes and anything doing those reads is made to wait while the writeback is drained (and draining happens slowly because the USB device is slow). Once the writeback is empty those new reads are allowed past again but the system quickly accumulates a large amount of writeback/dirty data again because the USB device is still slow and the next time a sync comes along...</p>
<p>Your dd oflag=direct works because you are saying you want the I/O to bypass Linux's caches forcing only the application that issued them to wait directly rather than filling a multi-megabyte buffer in the kernel to fullness and then waiting. Setting the maximum accumulated writeback to be smaller works because when the sync comes along you have less writes to flush before those new reads can be serviced</p>
</div>2017-01-17T06:53:35Z