2018-11-30
Today I (re-)learned that top
's output can be quietly system dependent
I'll start with a story that is the background. A few days ago I tweeted:
Current status: zfs send | zfs recv at 33 Mbytes/sec. This will take a while, and the server with SSDs and 10G networking is rather bored.
(It's not CPU-limited at either end and I don't think it's disk-limited. Maybe too many synchronous reads or something.)
I was wrong about this being disk-limited, as it turned out, and then Allan Jude had the winning suggestion:
Try adding '-c aes128-gcm@openssh.com' to your SSH invocation.
See also: <pdf link>
(If you care about 10G+ SSH, you want to read that PDF.)
This made a huge difference, giving
me basically 1G wire speeds for my ZFS transfers. But that difference
made me scratch my head, because why was switching SSH ciphers
making a difference when ssh
wasn't CPU-limited in the first
place? I came up with various theories and guesses, until today I
had a sudden terrible suspicion. The result of testing and confirming
that suspicion was another tweet:
Today I learned or re-learned a valuable lesson: in practice, top output is system dependent, in ways that are not necessarily obvious. For instance, CPU % on multi-CPU systems.
(On some systems, CPU % is the percent of a single CPU; on some it's a % of all CPUs.)
You see, the reason that I had confidently known that SSH wasn't
CPU-limited on sending machine, which was one of our OmniOS
fileservers, is that I had run top
and
seen that the ssh
process was only using 25% of the CPU. Case
closed.
Except that OmniOS top
and Linux's top
report CPU usage percentages
differently. On Linux, CPU percentage is relative to a single CPU,
so 25% is a quarter of one CPU, 100% is all of it, and over 100%
is a multi-threaded program that is using up more than one CPU's
worth of CPU time. On OmniOS, the version of top
we're using comes
from pkgsrc (in what is by now a very
old version), and that version reports CPU percentage relative to
all CPUs in the machine. Our OmniOS fileservers are 4-CPU
machines,
so that '25% CPU' was actually 'all of a single CPU'. In other words,
I was completely wrong about the sending ssh
not being CPU-limited.
Since ssh
was CPU limited after all, it's suddenly no surprise why
switching ciphers sped things up to basically wire speed.
(Years ago I established that the old SunSSH that OmniOS was using
back then was rather slow, but then later we
upgraded to OpenSSH and I sort of thought that
I could not worry about SSH speeds any more. Well, I was wrong. Of
course, nothing can beat not doing SSH at all but instead using, say,
mbuffer. Using mbuffer
also means that you can deliberately limit your transfer bandwidth
to leave some room for things like NFS fileservice.)
PS: There are apparently more versions than you might think. On the FreeBSD
10.4 machine I have access to, top
reports CPU percentage in the
same way Linux does (100% is a single-threaded process using all
of one CPU). Although both the FreeBSD version and our OmniOS version
say they're the William LeFebvre implementation and have similar
version numbers, apparently they diverged significantly at some
point, probably when people had to start figuring out how to make
the original version of top
deal with multi-CPU machines.
I've learned that sometimes the right way to show information is a simple one
When I started building some Grafana dashboards, I of course reached for everyone's favorite tool, the graph. And why not? Graphs are informative and beyond that, they're fun. It simply is cool to fiddle around for a bit and have a graph of your server's network bandwidth usage or disk bandwidth right there in front of you, to look at the peaks and valleys, to be able to watch a spike of activity, and so on.
For a while I made pretty much everything a graph; for things like bandwidth, this was obviously a graph of the rate. Then one day I was looking at a graph of filesystem read and write activity on one of our dashboards, with one filesystem bouncing up here and another one bouncing up there over Grafana's default six hour time window, and I found myself wondering which of these filesystems was the most active one over the entire time period. In theory the information was in the graph; in practice, it was inaccessible.
As an experiment, I added a simple bar graph of 'total volume over the time range'. It was a startling revelation. Not only did it answer my question, but suddenly things that had been buried in the graphs jumped out at me. Our web server turned out to use our old FTP area far much more than I would have guessed, for example. The simple bar graph also made it much easier to confirm things that I thought I was seeing in the more complex and detailed graphs. When one filesystem looked like it was surprisingly active in the over-time graph, I could look down to the bar graph and confirm that yes, it was (and also see how much its periodic peaks of activity added up to).
Since that first experience I have become much more appreciative of the power of simple ways to show summary information. Highly detailed graphs have an important place and they're definitely showing us things we didn't know, but simple summaries also reveal things too.
(I'd love the ability to get ad-hoc simple summaries from more complex graphs. I don't need 'average bandwidth over the graph's entire time range' very often, but sometimes I'd rather like to have it rather than having to guess by eyeball. It's sort of a pity that you can't give Grafana graphs alternate visualizations that you can cycle through, or otherwise have two (or more) panels share the same space so you can flip between them. As it stands, we have some giant dashboards.)