The bytes and events data for NFS mounts in /proc/self/mountstats
The per NFS mount mountstats
performance stats (see here for
an introduction) have two sets of high level
statistics, reported in the bytes:
and events:
lines. Both of
these come from counters that are described in comments in
include/linux/nfs_iostat.h
in the kernel source.
Of the two, the simpler is bytes:
.
A typical bytes:
line looks like:
bytes: 2320629391 2297630544 0 0 2298347151 2297630544 718354 717816
In order, let's call these fields nread, nwrite, dread, dwrite, nfsread,
nfswrite, pageread, and pagewrite. These count bytes read and written
to the server with simple read()
and write()
, with read()
and
write()
calls in O_DIRECT
mode, the actual number of bytes read
and written from the NFS server (regardless of how), and the number of
pages (not bytes) read or written via directly mmap()
'd files. I
believe that the page size is basically always 4 Kb (at least on x86).
It's routine for the O_DIRECT
numbers to be zero.
The most useful numbers of these for performance are what I've
called nfsread and nfswrite, the fifth and sixth fields, because
these represent the actual IO to the server.
A typical events:
line looks like this:
events: 3717478 126331741 28393 1036981 3355459 1099901 133724160 1975168 3589 2878751 1405861 5669601 720939 96113 3235157 225053 30643 3026061 0 23184 1675425 24 0 0 0 0 0
The events:
line tracks various sorts of high level NFS events. There
are a lot of them, so I am just going to list them in order (with field
numbers and some commentary):
- inode revalidate: How many times cached inode attributes have to be re-validated from the server.
- dnode revalidate: How many times cached dentry nodes (ie, name to inode mappings) have to be re-validated. I suspect that this spawns inode revalidations as well.
- data invalidate: How many times an inode had its cached data thrown out.
- attribute invalidate: How many times an inode has had cached inode
attributes invalidated.
- vfs open: How many times files or directories have been
open()
'd. - vfs lookup: How many name lookups in directories there have been.
- vfs access: How many times permissions have been checked via the
internal equivalent of
access()
. - vfs update page: Count of updates (and potential writes) to pages.
- vfs read page: This is the same as what I called pageread in the
bytes:
field. (Quite literally. The counters are incremented next to each other in the source.) - vfs read pages: Count of how many times a group of (mapped?) pages have been read. I believe it spawns 'vfs page read' events too but I'm not sure.
- vfs write page: Same as pagewrite in
bytes:
. - vfs write pages: Count of grouped page writes. Probably spawns 'vfs write page' events too.
- vfs getdents: How many times directory entries have read with
getdents()
. These reads can be served from cache and don't necessarily imply actual NFS requests. - vfs setattr: How many times we've set attributes on inodes.
- vfs flush: How many times pending writes have been forcefully flushed to the server (which can happen for various reasons).
- vfs fsync: How many times
fsync()
has been called on directories (which is a no-op for NFS) and files. Sadly you can't tell which is which. - vfs lock: How many times people have tried to lock (parts of) a file, including in ways that are basic errors and will never succeed.
- vfs file release: Basically a count of how many times files have been
closed and released.
- congestion wait: Not used for anything as far as I can tell.
There doesn't seem to be anything in the current kernel source that
actually increments the counter.
- truncation: How many times files have had their size truncated.
- write extension: How many times a file has been grown because you're writing beyond the existing end of the file.
- silly rename: How many times you removed a file while it was still
open by some process, forcing the kernel to instead rename it
to '
.nfsXXXXXX
' and delete it later. - short read: The NFS server gave us less data than we asked for when we tried to read something.
- short write: The NFS server wrote less data than we asked it to.
- jukebox delay: How many times the NFS server told us
EJUKEBOX
, which is theoretically for when the server is slowly retrieving something from offline storage. I doubt that you will ever see this from normal servers. - pnfs read: A count of NFS v4.1+ pNFS reads.
- pnfs write: A count of NFS v4.1+ pNFS writes.
All of the VFS operations are for VFS level file and address space operations. Fully understanding what these counters mean requires understanding when those operations are used, for what, and why. I don't have anywhere near this level of understanding of the Linux VFS layer, so my information here should be taken with some salt.
As you can see from my example events:
line, some events are common,
some are rare (eg #22, silly renames, of which there have been 24 over
the lifetime of this NFS mount), and some basically never happen (eg
everything from #23 onwards). Looking at our own collection of quite
a lot of NFS v3 filesystem mounts, the only thing we've seen even a
handful of (on three filesystems) are short writes. I suspect that those
happen when a filesystem runs out of space on the fileserver.
Disclaimer: I'm somewhat fuzzy on what exactly a number of the events counted here really represent because I haven't traced backwards from the kernel code that increments the counters to figure out just what calls it and what it does and so on.
(This is one reason why the lack of good documentation on mountstats
is really frustrating. Decoding a lot of this really needs someone who
actively knows the kernel's internals for the best, most trustworthy
results.)
Comments on this page:
|
|