OmniOS as a NFS server has problems with sustained write loads
We have been hunting an serious OmniOS problem for some time. Today we finally have enough data that I feel I can say something definitive:
An OmniOS NFS server will lock up under (some) sustained write loads if the write volume is higher than its disks can sustain.
I believe that this issue is not specific to OmniOS; it's likely Illumos in general, and was probably inherited from OpenSolaris and Solaris 10. We've reproduced a similar lockup on our old fileservers, running Solaris 10 update 8.
Our current minimal reproduction is the latest OmniOS (r151014) on our standard fileserver hardware, with 1G networking added and with a test pool of a single mirrored vdev on two (local) 7200 RPM 2TB SATA disks. With both 1G networks being driven at basically full wire speed by a collection of NFS client systems writing out a collection of different files on that test pool, the system will run okay for a while and then suddenly enter a situation where system free memory nosedives abruptly and the amount of kernel memory used for things other than the ARC jumps massively. This leads immediately to a total system hang when the free memory hits rock bottom.
(This is more write traffic than the disks can sustain due to mirroring. We have 200 MBytes/sec of incoming NFS writes, which implies 200 MBytes/sec of writes to each disk. These disks appear to top out at 150 MBytes/sec at most, and that's probably only a burst figure.)
Through a series of relatively obvious tests that are too long to detail here (eg running only one network's worth of NFS clients), we're pretty confident that this system is stable under a write load that it can sustain. Overload is clearly not immediate death (within a few seconds or the like), so we assume that the system can survive sufficiently short periods of overload if the load drops afterwards. However we have various indications that it does not fully recover from such overloads for a long time (if ever).
(Death under sustained overload would explain many of the symptoms we've seen of our various fileserver problems (eg). The common element in all of the trigger causes is that they cause (or could cause) IO slowdowns; backend disks with errors, backend disks that are just slow responding, full pools, or even apparently pools hitting their quota limits, even 10G networking problems. A slowdown of IO would take a fileserver that was just surviving a current high client write volume and push it over the edge.)
The memory exhaustion appears to be related to a high and increasing level of outstanding incomplete or unprocessed NFS requests. We have some indication that increasing the number of NFS server threads helps stave off the lockup for a while, but we've had our test server lock up (in somewhat different test scenarios) with widely varying numbers.
In theory this shouldn't happen. An NFS server that is being overloaded should push back on the clients in various ways, not enter a death spiral of accepting all of their traffic, eating all its memory, and then locking up. In practice, well, we have a serious problem in production.
PS: Yes, I'll write something for the OmniOS mailing lists at some point. In practice tweets are easier than blog entries, which are easier than useful mailing list reports.