Averages mislead: an illustrated example
Over about an hour recently, while backups were running, one of
our Solaris fileservers had an
average iSCSI operation completion time of 33 milliseconds. Not
great, especially since its iSCSI backends
are SSD-based, but not terrible. This is pretty typical for this
fileserver under load and while these particular numbers were
gathered with DTrace, they agree with the output of things like
iostat -zxn 60' (individual iSCSI disks could sometimes be slower
over a 60-second period, but not hugely so and it fluctuated back
But averages mislead. (We all know that, right?)
That fileserver did a bit over twice as many reads as writes. Writes had an average completion time of 1 millisecond, but reads had an average completion time of 47 milliseconds. Suddenly things are not looking as great.
Still, would you have guessed that just over 6.5% of reads and writes took 100 milliseconds or more? That's about one in twenty (and they were almost entirely reads). On the sort of good side, 2.6% took 'only' between 200 and 300 milliseconds (and 0.8% between 100 and 200 milliseconds). But the long tail is what you could politely call extended; the slowest operation took just over a whopping 3200 milliseconds (3.2 seconds). It was probably a read; 3.4% of the reads took 512 milliseconds or longer.
(There was exactly one write that was quantized into the 2048-4095 millisecond bucket, so the longest operation prize just might go to it. I was not tracking the maximum service time by operation type at the time I pulled these stats.)
Looking at the distributions of write times against read times shows even more (and DTrace let me get them). The graph of write speeds shows a big peak all the way to the left in the fastest power of two quantization bucket and then a rapid decay and a tiny tail of slow operations; intuitively this is sort of what I'd expect. Reads peaked slower (in the 4-7 millisecond bucket) and somewhat more evenly, but they have a significant tail of slower operations with clear secondary peaked areas; this I did not expect at all.
(The write peak is 83% of all writes with 7% next to it, the read peak is only 49% and has 30% more in the two buckets immediately beside it.)
All of this was hiding inside an average of 33 milliseconds.
I knew intellectually that averages were like this, that they can hide all sorts of things and that you want to look at 99% percentiles and all the rest. I've read any number of presentations and writeups about this, looked at illustrative graphs showing spikes that were hiding in broad overviews, and so on. But there turns out to be a difference between reading about something and having it smack you in the face, and I never really expected to find quite so many disturbing things under this particular rock.
(Once I started looking I could see some of this even in tools that
showed averages. For example, a 1-second
iostat on the fileserver
periodically showed drastic average service time spikes even though they
got smoothed away in my usual 60-second view.)