The Solaris 10 NFS server's caching of filesystem access permissions
Earlier, I mentioned that modern NFS
servers don't have a comprehensive list of NFS filesystem access
permissions stored in the kernel; instead they have a cache and
some sort of upcall mechanism where the kernel will ask
if a given client has access to a given filesystem if necessary.
I've recently been investigating how Solaris 10 handles this, so
here's what I know of the kernel authorization cache:
First, the Solaris kernel does cache negative entries (this IP address is not allowed to access this filesystem at all). This turns out to be fairly dangerous, because the cache has no timeout. If a negative entry is ever checked and cached, it will stay there until you flush the filesystem's cache entirely.
(The same is true of positive entries that you want to get rid of, either because you've removed a client's authorization or because you want to change how the filesystem is exported to it; part of the cache entry is whether the client has read-write or read-only access, and whether root is remapped or not. Or just because a machine has changed IP address and you want to get rid of any permissions that the old IP address has.)
The overall cache has no size limit at all, beyond a general one set by kernel memory limits. It will get shrunk if the kernel needs to reclaim memory, but even then no entry less than 60 minutes old will be removed. In our environment, such cache reclaims appear to be vanishingly uncommon (ie, completely unseen), based on kernel stats.
There is a separate auth cache for each exported filesystem. As far as I can tell, a filesystem's auth cache is discarded entirely if it is unshared or reshared, including if it is reshared with the same sharing settings. It otherwise effectively never expires entries. Flushing a filesystem's auth cache causes every client to be revalidated the next time that they make an NFS request to that filesystem.
Because all of this is only in kernel memory, all auth caches are lost if the system reboots. Thus on fileserver reboot all clients are revalidated for all filesystems on a rolling basis, as each client tries to do NFS to each filesystem that they have mounted. This may provoke a storm of revalidations after the reboot of a popular fileserver with a bunch of clients.
The cache is populated by upcalling to
mountd on hopefully infrequent
demand (through mechanisms that are beyond the scope of this entry). If
mountd answers properly its answer of the moment, whatever that is,
gets cached. There are presumably timeout and load limits on these
upcalls, but I don't understand (Open)Solaris code well enough yet to
find them. (I hope that more than one upcall can be in progress at
Sidebar: Getting cache stats
This is for the benefit of people (such as me) poking around with
mdb -k. The internal NFS server auth cache stats are in
nfsauth_cache_reclaim, which counts how many times a reclaim
has been done (but not how many entries have been reclaimed).
To see them (in hex) one uses the
The code for most of this is in
usr/src/uts/common/fs/nfs; see also
nfs_export.c, which has the
overall NFS server export list.