Solving an automounter timeout problem with brute force

January 9, 2007

Our central mail machine runs various cron jobs as part of its work. Starting recently, every now and then a cron job (or a command run out of an alias) would randomly die with an error like:

sh: /cs/foo/adm/script: cannot execute

(Where /cs/foo is NFS mounted through the automounter, and the cron entry just runs that script.)

I am pretty sure that this is a gift from the Solaris 8 automounter.

Our central mail machine is pretty old and pokey, and we recently switched to a new method of authenticating NFS mounts that requires a ssh callback. So my operating theory is that this is the charmingly non-specific error you get when the NFS mount reply is too slow in coming and the automounter just gives up.

My current brute force solution is a little script I call 'keepmounted':

for i in $@; do
  nohup sh -c "cd $i && (while :; do sleep 604800; done)" >/dev/null 2>&1 </dev/null &;
done

(The sleep value is more or less arbitrary.)

Then I just ran it for every automounted filesystem that we saw problems with and moved on to other fires. (Yes, at some point I need a better solution, but the machine is rebooted only rarely and we're working on replacing it anyways.)

(This sort of cheap hack is a surprisingly common occurrence in system administration. Sometimes a bandaid is really the best solution.)


Comments on this page:

By Dan.Astoorian at 2007-01-10 15:46:44:

My current brute force solution is a little script I call 'keepmounted':

   for i in $@; do
     nohup sh -c "cd $i && (while :; do sleep 604800; done)" >/dev/null 2>&1 </dev/null &;
   done

Another of those issues that rings all kinds of bells from when I was there. (You might want to look at /slocal/bin/holdopen :-) )

In particular, the refinements I used to make on that approach were:

  • instead of one process per directory, use one shell script which does "exec $fd < $i/." (incrementing $fd each time through the loop) so that the process table isn't unnecessarily cluttered; and
  • "pwait $$" or "pwait 1" is a little more elegant than an endless loop of sleep commands (but has the disadvantage of being Solaris-specific).

--Dan

By cks at 2007-01-11 01:49:19:

The exec trick wouldn't have occurred to me because I have a twitch about assuming that I can just open() directories. (I have a memory that on some systems you have to pass a magic flag in to do it, although some testing now shows that it's not the case on Linux, Solaris, or FreeBSD.)

There turn out to be other problems with the exec approach if you want to hold used a lot of directories, which I turned into a whole entry since it was getting kind of long.

Written on 09 January 2007.
« What I really want from an automounter
A gotcha with inetd/xinetd and popular UDP services »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jan 9 23:35:19 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.