Solving an automounter timeout problem with brute forceOur central mail machine runs various cron jobs as part of its work. Starting recently, every now and then a cron job (or a command run out of an alias) would randomly die with an error like:
(Where I am pretty sure that this is a gift from the Solaris 8 automounter. Our central mail machine is pretty old and pokey, and we recently switched to a new method of authenticating NFS mounts that requires a ssh callback. So my operating theory is that this is the charmingly non-specific error you get when the NFS mount reply is too slow in coming and the automounter just gives up. My current brute force solution is a little script I call 'keepmounted':
(The sleep value is more or less arbitrary.) Then I just ran it for every automounted filesystem that we saw problems with and moved on to other fires. (Yes, at some point I need a better solution, but the machine is rebooted only rarely and we're working on replacing it anyways.) (This sort of cheap hack is a surprisingly common occurrence in system administration. Sometimes a bandaid is really the best solution.) (2 comments.)
|
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |