When using Illumos's lockstat
, check the cumulative numbers too
Suppose, not entirely hypothetically that you
have an Illumos (or OmniOS or etc) system that is experiencing
something that looks an awful lot like kernel contention; for
example, periodic 'mpstat 1
' output where one CPU is spending
100% of its time in kernel code. Perhaps following Brendan Gregg's
Solaris USE method, you stumble
over lockstat
and decide to give it a try. This is a fine thing,
as it's a very nice tool and can give you lots of fascinating output.
However, speaking from recent experience, I urge you to at some
point run lockstat
with the -P
option and check that output
too. I believe that lockstat normally sorts its output by count,
highest first; -P
changes this to sort by total time (ie the count
times its displayed average time). The very important thing that
this does is it very prominently surfaces relatively rare but really
long things. In my case, I spent a bunch of time and effort looking
at quite frequent and kind of alarming looking adaptive mutex spins,
but when I looked at 'lockstat -P
' I discovered a lock acquisition
that only had 30 instances over 60 seconds but that had an average
spin time (not block time) of 55 milliseconds.
(Similarly, when I looked at the adaptive mutex block times I discovered the same lock acquisition, this time blocked 37 times in 60 seconds with an average block time of 1.6 seconds.)
In theory you can spot these things when scanning through the full
lockstat
output even without -P
, but in practice humans don't
work that way; we scan the top of the list and then as everything
starts to dwindle away into sameness our eyes glaze over. You're
going to miss things, so let lockstat
do the work for you to
surface them.
(If you specifically suspect long things you can use -d
to only
report on them, but picking a useful -d
value probably requires
some guesswork and looking at basic lockstat
output.)
By the way, there turn out to be a bunch of interesting tricks you
can do with lockstat
. I recommend reading all the way through the
EXAMPLES
section and especially paying attention to the discussion
of why various flags get used in various situations. Unlike the usual
manpage examples, it only gets more interesting as it goes along.
(And if you need really custom tooling you can use the lockstat DTrace provider in your own DTrace scripts. I wound up doing that today as part of getting information on one of our problems.)
|
|