The Illumos NFS server's caching of filesystem access permissions

October 30, 2017

Years ago I wrote The Solaris 10 NFS server's caching of filesystem access permissions. I was recently digging in this area of the Illumos source code and discovered that there have been a few changes, so here is a brief update. The background is that that Illumos NFS server code, like basically all modern NFS servers, does not maintain a full list of what clients are authorized to access what filesystems. Instead it maintains a cache and upcalls to user level code whenever it feels that the cache is insufficient information.

As in Solaris 10, the Illumos kernel NFS authorization cache holds both positive and negative entries on a per-filesystem basis. However, in Illumos this cache now sort of has a timeout; if a cache entry is older then 600 seconds (ten minutes), the kernel will try to refresh it the next time the entry is used. This attempt to refresh the entry doesn't immediately cause it to expire or be revalidated; instead, it's added to a queue for the refresh thread to process. Until the refresh queue gets around to processing the entry (and gets an answer back from its upcall), the kernel will continue to use the current cached state as the best current answer.

(As in Solaris 10, the cache for a filesystem is discarded entirely if the filesystem is unshared or reshared, including being reshared with exactly the same settings.)

As far as I can tell, this refreshing only happens when the entry is used. There doesn't appear to be anything that runs around trying to revalidate old entries. So you can try a mount once, get a failure, have that failure cached in the kernel, come back a day later, try the mount again, and for at least the first access the kernel will still use that day-old cached entry unless memory pressure has pushed it out in the mean time.

(The easiest way for this to happen is for a client to try a NFS mount before it's been added to the netgroup that controls access. Merely updating the netgroup membership doesn't re-export the filesystem and thus doesn't flush the authorization cache for it.)

As far as I can tell, the refresh process is single-threaded; only one refresh thread is started, and it only makes one upcall at a time. The initial upcalls to mountd (when there's no existing authorization cache entry for a client/filesystem combination) are done directly in the NFS authorization lookup and so there can be several of them at once, although presumably there are limits on simultaneous requests and so on.

The cache size continues to be unlimited and shrinks only under memory pressure (if that ever happens; it doesn't appear to on our OmniOS NFS servers). During shrinking, only cache entries that have been unused for at least 60 minutes are candidates to be discarded; entries in active use are never dropped. Entries are kept active by clients doing NFS operations to filesystems, so if you never touch a particular filesystem from a particular client, the cache entry may eventually become a candidate for eviction.

(But note that this is any NFS operation, including things like df.)

Sidebar: Illumos NFS authorization cache stats

As in Solaris 10, the easiest way to get access to cache stats is with mdb -k. Illumos has added some additional stats beyond nfsauth_cache_hit, nfsauth_cache_miss, and nfsauth_cache_reclaim. nfsauth_cache_refresh counts how many refreshes have been queued up; exi_cache_auth_reclaim_failed and exi_cache_clnt_reclaim_failed appear to count a couple of ways that reclaims due to kernel memory pressure can fail.

There are a number of DTrace probes embedded in this whole process. I haven't looked into this enough to say anything about them, so you're going to need to read the source code.

Written on 30 October 2017.
« There are several ways to misread specifications
There are two sorts of TLS certificate mis-issuing »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Oct 30 01:10:11 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.