Wandering Thoughts archives

2017-10-30

The Illumos NFS server's caching of filesystem access permissions

Years ago I wrote The Solaris 10 NFS server's caching of filesystem access permissions. I was recently digging in this area of the Illumos source code and discovered that there have been a few changes, so here is a brief update. The background is that that Illumos NFS server code, like basically all modern NFS servers, does not maintain a full list of what clients are authorized to access what filesystems. Instead it maintains a cache and upcalls to user level code whenever it feels that the cache is insufficient information.

As in Solaris 10, the Illumos kernel NFS authorization cache holds both positive and negative entries on a per-filesystem basis. However, in Illumos this cache now sort of has a timeout; if a cache entry is older then 600 seconds (ten minutes), the kernel will try to refresh it the next time the entry is used. This attempt to refresh the entry doesn't immediately cause it to expire or be revalidated; instead, it's added to a queue for the refresh thread to process. Until the refresh queue gets around to processing the entry (and gets an answer back from its upcall), the kernel will continue to use the current cached state as the best current answer.

(As in Solaris 10, the cache for a filesystem is discarded entirely if the filesystem is unshared or reshared, including being reshared with exactly the same settings.)

As far as I can tell, this refreshing only happens when the entry is used. There doesn't appear to be anything that runs around trying to revalidate old entries. So you can try a mount once, get a failure, have that failure cached in the kernel, come back a day later, try the mount again, and for at least the first access the kernel will still use that day-old cached entry unless memory pressure has pushed it out in the mean time.

(The easiest way for this to happen is for a client to try a NFS mount before it's been added to the netgroup that controls access. Merely updating the netgroup membership doesn't re-export the filesystem and thus doesn't flush the authorization cache for it.)

As far as I can tell, the refresh process is single-threaded; only one refresh thread is started, and it only makes one upcall at a time. The initial upcalls to mountd (when there's no existing authorization cache entry for a client/filesystem combination) are done directly in the NFS authorization lookup and so there can be several of them at once, although presumably there are limits on simultaneous requests and so on.

The cache size continues to be unlimited and shrinks only under memory pressure (if that ever happens; it doesn't appear to on our OmniOS NFS servers). During shrinking, only cache entries that have been unused for at least 60 minutes are candidates to be discarded; entries in active use are never dropped. Entries are kept active by clients doing NFS operations to filesystems, so if you never touch a particular filesystem from a particular client, the cache entry may eventually become a candidate for eviction.

(But note that this is any NFS operation, including things like df.)

Sidebar: Illumos NFS authorization cache stats

As in Solaris 10, the easiest way to get access to cache stats is with mdb -k. Illumos has added some additional stats beyond nfsauth_cache_hit, nfsauth_cache_miss, and nfsauth_cache_reclaim. nfsauth_cache_refresh counts how many refreshes have been queued up; exi_cache_auth_reclaim_failed and exi_cache_clnt_reclaim_failed appear to count a couple of ways that reclaims due to kernel memory pressure can fail.

There are a number of DTrace probes embedded in this whole process. I haven't looked into this enough to say anything about them, so you're going to need to read the source code.

IllumosNFSAuthCaching written at 01:10:11; Add Comment

2017-10-24

Our frustrations with OmniOS's 'KYSTY' minimalism

OmniOS famously follows a principle called KYSTY, where OmniOS itself ships with minimal amounts of software (and the versions can be out of date). As far as I know, OmniOS CE has continued this practice, which has an obvious appeal for people trying to maintain an OS distribution on limited amounts of time (especially a LTS version, where you might be stuck patching old versions of programs that aren't supported upstream any more). All of this is well and good, but in practice the results of this KYSTY approach have been one of our significant points of frustration with OmniOS.

As sysadmins operating servers (primarily Linux ones), we have come to expect that our systems will have a certain basic collection of workable standard programs that we use for basic system management. For instance, we want every system to be able to send us email, and we really want to do this with Postfix (Exim is an acceptable substitute). Almost every system needs a program that can talk to disks to get SMART information, and while there are alternatives to tcpdump, we have tcpdump everywhere else and we really want one standard program. I could go on; there's an entire collection of things that we consider standard that just aren't there on a baseline OmniOS machine.

(I can't not mention top, though.)

We were able to mostly fix this with various third party package sources, but the result is complicated, requires a large magic $PATH in order to work relatively seamlessly, has gaps, and is quietly fragile over the long term. As an example of something that has quietly worried me, at this point there's probably no way to exactly reproduce one of our fileservers because it's very likely that at least some of the third party package sources we use have moved on from the package versions we installed. Does this matter? Probably not, which is why we didn't spend a significant amount of effort to figure out how to get and freeze local copies of all those packages.

(The exact version of top that's installed is probably not important for our NFS fileservers. We could even live without top at all, although it would be annoying.)

I sympathize with OmniOS here in the abstract, but in the concrete it was and is a point of friction when we work with our OmniOS machines. They're different, and from our biased perspective, gratuitously so. The result makes our life harder and leaves us less happy with OmniOS.

(I think that a great deal of the problems could be removed if there was an OmniOS CE equivalent of Ubuntu's 'universe' repository and it could easily be enabled. The main OmniOS CE developers wouldn't be responsible for maintaining software there; instead it would be open for reasonably vetted community contributions. Officially embracing pkgsrc might be another option, but I don't like that as much for various reasons.)

OmniOSMinimalismFrustration written at 00:41:36; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.