Chris's Wiki :: blog/sysadmin/MetricsAndUnderstandingMore Commentshttps://utcc.utoronto.ca/~cks/space/blog/sysadmin/MetricsAndUnderstandingMore?atomcommentsDWiki2019-01-03T05:06:13ZRecent comments in Chris's Wiki :: blog/sysadmin/MetricsAndUnderstandingMore.By Chris Siebenmann on /blog/sysadmin/MetricsAndUnderstandingMoretag:CSpace:blog/sysadmin/MetricsAndUnderstandingMore:4f0ef388440d4b0a012d440a2de75da01d774358Chris Siebenmann<div class="wikitext"><p>The average IO queue size is in outstanding IO operations and the disk
utilization is in percent. The yellow disk is not RAIDed or clustered,
although there is a pair of disks somewhat visible that are in a RAID-0
(the dark blue and barely visible red that are being read from as well
as written to). The writes are basically asynchronous and limited by
the bandwidth to the yellow disk, and at 100% utilization the kernel is
pushing write IO to disk as fast as it can.</p>
<p>It'd be interesting to get the actual disk service time, but we can't;
Linux's disk IO stats don't provide it. All you can get is total wait
time from submission to the block IO system through completion of the
actual disk IO; that can rise (as it does here) because of disk service
time or because of increasing queues. With flat average IO queue sizes
we can sort of assume that the rise in overall write service time is
due to increased disk service times, but it's not sure.</p>
</div>2019-01-03T05:06:13ZBy Evelyn Mitchell on /blog/sysadmin/MetricsAndUnderstandingMoretag:CSpace:blog/sysadmin/MetricsAndUnderstandingMore:df9bc648f9b01d9693a5f1d02d645900d612f18bEvelyn Mitchellhttps://www,tummy.com<div class="wikitext"><p>What I see is the yellow disk showing slower writes due to backpressure from IO contention. The average IO Queue size for the yellow disk is often maxed out, at 100 (Percent?), and the disk utilization is also maxed out at 100 (percent? space used?). </p>
<p>If your disks are clustered or raided, then the yellow disk could be failing, with higher rates of errors causing a higher need for correction. </p>
<p>Use smartctl to check the SMART report on the disk. Think about replacing it.</p>
<p>We usually monitor for SMART errors, and replace when they start showing up. This graph tells a story, but not the clearest, most direct story.</p>
</div>2019-01-03T03:56:37ZBy Etienne Dechamps on /blog/sysadmin/MetricsAndUnderstandingMoretag:CSpace:blog/sysadmin/MetricsAndUnderstandingMore:9f7e1937f3f9af1bc7a6370385fae7f55b807515Etienne Dechamps<div class="wikitext"><p>...and the physical reason for the above is because spinning disks start writing at the edge of the disk, where linear speed is maximal. As the offset increases, the head moves closer to the centre of the disk, where linear speed is minimal. Every single hard drive in existence will show this behaviour; it is extremely mundane.</p>
</div>2018-12-28T09:55:09ZFrom 188.212.132.204 on /blog/sysadmin/MetricsAndUnderstandingMoretag:CSpace:blog/sysadmin/MetricsAndUnderstandingMore:a55c415833721b9a7387cffee8d2d48a08d4cffcFrom 188.212.132.204<div class="wikitext"><p>It's caused by the normal speed vs offset curve for hard disks (and other spinning media), for example:<br>
<a href="https://macperformanceguide.com/Storage-Drive-Toshiba-3TB.html">https://macperformanceguide.com/Storage-Drive-Toshiba-3TB.html</a> <br>
<a href="https://techreport.com/review/25391/wd-red-4tb-hard-drive-reviewed/4">https://techreport.com/review/25391/wd-red-4tb-hard-drive-reviewed/4</a> <br>
<br>
It becomes visible in your graph as speed vs time because you're using the disks as tapes, writing sequentially to them from start to finish.</p>
</div>2018-12-28T08:30:52Z