Solving an automounter timeout problem with brute force

January 9, 2007

Our central mail machine runs various cron jobs as part of its work. Starting recently, every now and then a cron job (or a command run out of an alias) would randomly die with an error like:

sh: /cs/foo/adm/script: cannot execute

(Where /cs/foo is NFS mounted through the automounter, and the cron entry just runs that script.)

I am pretty sure that this is a gift from the Solaris 8 automounter.

Our central mail machine is pretty old and pokey, and we recently switched to a new method of authenticating NFS mounts that requires a ssh callback. So my operating theory is that this is the charmingly non-specific error you get when the NFS mount reply is too slow in coming and the automounter just gives up.

My current brute force solution is a little script I call 'keepmounted':

for i in $@; do
  nohup sh -c "cd $i && (while :; do sleep 604800; done)" >/dev/null 2>&1 </dev/null &;
done

(The sleep value is more or less arbitrary.)

Then I just ran it for every automounted filesystem that we saw problems with and moved on to other fires. (Yes, at some point I need a better solution, but the machine is rebooted only rarely and we're working on replacing it anyways.)

(This sort of cheap hack is a surprisingly common occurrence in system administration. Sometimes a bandaid is really the best solution.)

Written on 09 January 2007.
« What I really want from an automounter
A gotcha with inetd/xinetd and popular UDP services »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jan 9 23:35:19 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.