Wandering Thoughts archives

2009-06-22

Solaris 10 NFS server parameters that we change and why

One of the ways that Solaris does not make me happy is that they do not seem to have changed various system defaults since, oh, 1996, when machines were much smaller than they are now. As a result, we have accumulated a set of NFS server parameters that we have had to change in order to get decent performance and functionality.

(This set is not particularly novel, which is part of the irritation; pretty much everyone winds up making many of these changes sooner or later. But instead of the system shipping with sensible defaults, you are left to discover them on your own, or not discover them and wonder why your theoretically powerful and modern Solaris NFS server is performing pathetically badly. Or why it is exploding.)

Unless mentioned otherwise, all of these parameters are set (or changed, really) in /etc/default/nfs:

  • NFSD_SERVERS, from 16 to 512

    The maximum number of concurrent NFS requests. The default is too low to get decent performance under load, and has been for years. This is one of the standard tuneables that everyone says you should change, but beware; the usual advice on the Internet is to set it to 1024, but on our fileservers having that many NFS server threads locked up my test system (running on reasonably beefy hardware).

    (Apparently the NFS server threads are very high priority, and if you have too many of them they will happily eat all of your CPU.)

  • LOCKD_SERVERS, from 20 to 128
    LOCKD_LISTEN_BACKLOG, from 32 to 256

    The maximum number of simultaneous NFS lock requests. We saw NFS locking failures under production load that were cured by doing this. I believe that LOCKD_SERVERS is the important one, but we haven't tested this.

  • NFS_SERVER_VERSMAX, from 4 to 3

    The maximum NFS protocol version that the server will use.

    We're wimps. NFS v4 is peculiar and we've never tested it, and I have no desire to find out all the ways that Linux and Solaris don't get along about it. So even if machines think that they're capable of doing it, we don't want them to.

  • set nfssrv:nfs_portmon = 1, which is set in /etc/system.

    Require NFS requests to come from reserved ports. In theory you might be able to change this on a live system with mdb -kw, but really, just schedule a reboot.

As a cautionary note on Solaris 10 x86, remember to update the boot archive with 'bootadm update-archive' every time you change /etc/system. I don't think that changing /etc/default/nfs requires updating the boot archive, but it can't hurt to run the command anyways.

Necessary disclaimer: these work for us but may not work for you. Always test your system.

SolarisNFSServerTuning written at 23:57:18; Add Comment

2009-06-12

What I know about Solaris 10 NFS server file lock limits

Suppose that you have a Solaris fileserver and a bunch of NFS clients (in this case they're Linux machines, but I don't think it matters). Perhaps one or two of them are Samba servers. Further suppose that your client machines start reporting things like 'fcntl() failed: No locks available'. What is limiting you, and what do you do about it?

(We'll assume that you use tools like lslk to rule out some process on a client machine holding on to a lot of locks.)

Although it's dangerous to extrapolate from current OpenSolaris code (and the lack of documented tuneable parameters), it seems as if Solaris 10 has no intrinsic limit on how many file locks it supports; if you ask for too many for the system to handle, your processes just start going to sleep as they wait for kernel memory to be freed up.

However, there does appear to be a limit on how many concurrent NFS locking requests the server will handle; this is the setting for LOCKD_SERVERS in /etc/default/nfs, which defaults to 20. It appears that if you run into this limit, clients start getting these 'no locks available' messages. Since this is based on concurrent requests, not the total number of locks used, it looks like you can run into this if you merely have some clients doing a lot of short term locking requests.

(It's also possible that you can run into this if the server is short of memory in general, so that the NFS locking requests start running into the 'sleep waiting for kernel memory' issue, clogging up this limited supply of threads.)

From our experience, using svcadm to disable and re-enable the nlockmgr service is enough to pick up changes to /etc/default/nfs. This did make file locking stuff on the clients stall for about thirty seconds or so, but even still that was a lot less drastic than, say, rebooting the fileserver.

SolarisNFSLockLimits written at 01:28:51; Add Comment

By day for June 2009: 12 22; before June; after June.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.