What I think I want out of a hypothetical nfsiotop
for Linux
I wish there was a version of Linux's nfsiostat that worked gracefully when you have several hundred NFS mounts across multiple NFS fileservers.
(I'm going to have to write one, aren't I.)
Linux exposes a very large array of per-filesystem NFS client
statistics in /proc/self/mountstats
(see here)
and there are some programs that digest this data and report it,
such as nfsiostat(8). Nfsiostat
generally works decently to give you useful information, but it's
very much not designed for systems with, for example, over 250 NFS
mounts. Unfortunately that describes us, and we would rather like
to have a took which tells us what the NFS filesystem hotspots are
on a given NFS client if and when it's clearly spending a lot of
time waiting for NFS IO.
(We have some machines with this sort of problem.)
As suggested by the name, a hypothetical nfsiotop
would have to
only report on the top N filesystems, which raises the question of
how you sort NFS filesystems here. Modern versions of nfsiostat
sort by operations per second, which is a start, but I think that one
should also be able to sort by total read and write volume and
probably also by write volume alone. Other likely interesting things
to sort on are the average response time and the current number of
operations outstanding. An ideal tool would also be able to aggregate
things into per fileserver statistics.
(All of this suggests that the real answer is that you should be able to sort on any field that the program can display, including some synthetic ones.)
As my aside in the tweet suggests, I suspect that I'm going to have
to write this myself, and probably mostly from scratch. While
nfsiostat is written in Python and so is probably reasonably
straightforward for me to modify, I suspect that it has too many
things I'd want to change. I don't want little tweaks for things
like its output, I want wholesale restructuring. Hopefully I can
reuse its code to parse the mountstats
file, since that seems
reasonably tedious to write from scratch. On the other hand, the
current nfsiostat Python code seems amenable to a quick gut job to
prototype the output that I'd want.
(Mind you, prototypes tend to drift into use. But that's not necessarily a bad thing.)
PS: I've also run across kofemann/nfstop, which has some interesting features such as a per-UID breakdown, but it works by capturing NFS network traffic and that's not the kind of thing I want to have to use on a busy machine, especially at 10G.
PPS: I'd love to find out that a plausible nfsiotop already exists, but I haven't been able to turn one up in Internet searches so far.
|
|