Wandering Thoughts archives


The Solaris 10 NFS server's caching of filesystem access permissions

Earlier, I mentioned that modern NFS servers don't have a comprehensive list of NFS filesystem access permissions stored in the kernel; instead they have a cache and some sort of upcall mechanism where the kernel will ask mountd if a given client has access to a given filesystem if necessary. I've recently been investigating how Solaris 10 handles this, so here's what I know of the kernel authorization cache:

First, the Solaris kernel does cache negative entries (this IP address is not allowed to access this filesystem at all). This turns out to be fairly dangerous, because the cache has no timeout. If a negative entry is ever checked and cached, it will stay there until you flush the filesystem's cache entirely.

(The same is true of positive entries that you want to get rid of, either because you've removed a client's authorization or because you want to change how the filesystem is exported to it; part of the cache entry is whether the client has read-write or read-only access, and whether root is remapped or not. Or just because a machine has changed IP address and you want to get rid of any permissions that the old IP address has.)

The overall cache has no size limit at all, beyond a general one set by kernel memory limits. It will get shrunk if the kernel needs to reclaim memory, but even then no entry less than 60 minutes old will be removed. In our environment, such cache reclaims appear to be vanishingly uncommon (ie, completely unseen), based on kernel stats.

There is a separate auth cache for each exported filesystem. As far as I can tell, a filesystem's auth cache is discarded entirely if it is unshared or reshared, including if it is reshared with the same sharing settings. It otherwise effectively never expires entries. Flushing a filesystem's auth cache causes every client to be revalidated the next time that they make an NFS request to that filesystem.

Because all of this is only in kernel memory, all auth caches are lost if the system reboots. Thus on fileserver reboot all clients are revalidated for all filesystems on a rolling basis, as each client tries to do NFS to each filesystem that they have mounted. This may provoke a storm of revalidations after the reboot of a popular fileserver with a bunch of clients.

The cache is populated by upcalling to mountd on hopefully infrequent demand (through mechanisms that are beyond the scope of this entry). If mountd answers properly its answer of the moment, whatever that is, gets cached. There are presumably timeout and load limits on these upcalls, but I don't understand (Open)Solaris code well enough yet to find them. (I hope that more than one upcall can be in progress at once.)

Sidebar: Getting cache stats

This is for the benefit of people (such as me) poking around with mdb -k. The internal NFS server auth cache stats are in three variables: nfsauth_cache_hit, nfsauth_cache_miss, and nfsauth_cache_reclaim, which counts how many times a reclaim has been done (but not how many entries have been reclaimed). To see them (in hex) one uses the mdb command:

nfsauth_cache_hit ::print

The code for most of this is in nfs_auth.c in usr/src/uts/common/fs/nfs; see also nfs_export.c, which has the overall NFS server export list.

solaris/SolarisNFSAuthCaching written at 01:06:51; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.