2009-06-22
Solaris 10 NFS server parameters that we change and why
One of the ways that Solaris does not make me happy is that they do not seem to have changed various system defaults since, oh, 1996, when machines were much smaller than they are now. As a result, we have accumulated a set of NFS server parameters that we have had to change in order to get decent performance and functionality.
(This set is not particularly novel, which is part of the irritation; pretty much everyone winds up making many of these changes sooner or later. But instead of the system shipping with sensible defaults, you are left to discover them on your own, or not discover them and wonder why your theoretically powerful and modern Solaris NFS server is performing pathetically badly. Or why it is exploding.)
Unless mentioned otherwise, all of these parameters are set (or changed,
really) in /etc/default/nfs:
NFSD_SERVERS, from 16 to 512The maximum number of concurrent NFS requests. The default is too low to get decent performance under load, and has been for years. This is one of the standard tuneables that everyone says you should change, but beware; the usual advice on the Internet is to set it to 1024, but on our fileservers having that many NFS server threads locked up my test system (running on reasonably beefy hardware).
(Apparently the NFS server threads are very high priority, and if you have too many of them they will happily eat all of your CPU.)
LOCKD_SERVERS, from 20 to 128
LOCKD_LISTEN_BACKLOG, from 32 to 256The maximum number of simultaneous NFS lock requests. We saw NFS locking failures under production load that were cured by doing this. I believe that
LOCKD_SERVERSis the important one, but we haven't tested this.NFS_SERVER_VERSMAX, from 4 to 3The maximum NFS protocol version that the server will use.
We're wimps. NFS v4 is peculiar and we've never tested it, and I have no desire to find out all the ways that Linux and Solaris don't get along about it. So even if machines think that they're capable of doing it, we don't want them to.
set nfssrv:nfs_portmon = 1, which is set in/etc/system.Require NFS requests to come from reserved ports. In theory you might be able to change this on a live system with
mdb -kw, but really, just schedule a reboot.
As a cautionary note on Solaris 10 x86, remember to update the boot
archive with 'bootadm update-archive' every time
you change /etc/system. I don't think that changing /etc/default/nfs
requires updating the boot archive, but it can't hurt to run the
command anyways.
Necessary disclaimer: these work for us but may not work for you. Always test your system.
2009-06-12
What I know about Solaris 10 NFS server file lock limits
Suppose that you have a Solaris fileserver and a bunch of NFS clients
(in this case they're Linux machines, but I don't think it matters).
Perhaps one or two of them are Samba servers. Further suppose that your
client machines start reporting things like 'fcntl() failed: No locks
available'. What is limiting you, and what do you do about it?
(We'll assume that you use tools like lslk to rule out some process on
a client machine holding on to a lot of locks.)
Although it's dangerous to extrapolate from current OpenSolaris code (and the lack of documented tuneable parameters), it seems as if Solaris 10 has no intrinsic limit on how many file locks it supports; if you ask for too many for the system to handle, your processes just start going to sleep as they wait for kernel memory to be freed up.
However, there does appear to be a limit on how many concurrent NFS
locking requests the server will handle; this is the setting for
LOCKD_SERVERS in /etc/default/nfs, which defaults to 20. It
appears that if you run into this limit, clients start getting these 'no
locks available' messages. Since this is based on concurrent requests,
not the total number of locks used, it looks like you can run into
this if you merely have some clients doing a lot of short term locking
requests.
(It's also possible that you can run into this if the server is short of memory in general, so that the NFS locking requests start running into the 'sleep waiting for kernel memory' issue, clogging up this limited supply of threads.)
From our experience, using svcadm to disable and re-enable
the nlockmgr service is enough to pick up changes to
/etc/default/nfs. This did make file locking stuff on the clients
stall for about thirty seconds or so, but even still that was a lot less
drastic than, say, rebooting the fileserver.