Wandering Thoughts archives

2014-11-09

NFS hard mounts versus soft mounts

On most Unix systems NFS mounts come in your choice of two flavours, hard or soft. The Linux nfs manpage actually has a very good description of the difference; the short summary is that a hard NFS mount will keep trying NFS operations endlessly until the server responds while a soft NFS mount will give up and return errors after a while.

You can find people with very divergent opinions about which is better (cf, 2). My opinion is fairly strongly negative about soft mounts. The problem is that it is routine for a loaded NFS server to not respond to client requests within the client timeout interval because the timeout is not for the NFS server to receive the request, it's for the server to fully process it. As you might imagine, a server under heavy IO and network load may not be able to finish your disk IO for some time, especially if it's write IO. This makes NFS timeouts that would trigger soft NFS mount errors a relatively routine event in many real world environments.

(On Linux, any time a client reports 'nfs: server X not responding, still trying' that would be an IO error on a soft NFS mount. In our fileserver environment, some of these happen nearly every day.)

Many Unix programs do not really expect their IO to fail. Even programs that do notice IO errors often don't and can't do anything more than print an error message and perhaps abort. This is not a helpful response to transient errors, but then Unix programs are generally not really designed for a world with routine transient IO errors. Even when programs report the situation, users may not notice or may not be prepared to do very much except, perhaps, retry the operation.

(Write errors are especially dangerous because they can easily cause you to permanently lose data, but even read errors will cause you plenty of heartburn.)

Soft NFS mounts primarily make sense when you have some system that absolutely must remain responsive and cannot delay for too long for any reason. In this case a random but potentially very long kernel imposed delay is a really bad thing and you'd rather have the operation error out entirely so that your user level code can take action and at least respond in some way. Some NFS clients (or just specific NFS mounts) are only used in this way, for a custom system, and are not exposed to general use and general users.

(IO to NFS hard mounts can still be interrupted if you've sensibly mounted them with the intr option. It just requires an explicit decision at user level that the operation should be aborted, instead of the kernel deciding that all operations that have taken 'too long' should be aborted.)

PS: My bias here is that I've always been involved in running general use NFS clients, ones where random people will be using the NFS mounts for random and varied things with random and varied programs of very varied quality. This is basically a worst case for NFS soft mounts.

unix/NFSHardVsSoft written at 00:41:15; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.