Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web.
|
2007-03-14 The problem of machine startup order dependenciesOne of the tricky bits of organizing a sufficiently large group of machines is avoiding circular dependencies in the machine startup order, so that you can actually bring your systems up after things like a complete machine room power outage. (In our case it was planned; the electricians wanted the master breakers off before they played around in our breaker panel to give us more usable circuits.) Startup order dependencies come in a variety of flavours. The simple one is a startup script that depends on another machine being up, for example trying to NFS mount filesystems; more advanced, more dangerous, and fortunately much rarer is the sort where a machine will start but malfunction (for example, bounce all email) unless another machine is already up. Things like NFS mounts are easy to see, but sometimes the dependency is more indirect and much less obvious. Part of the problem is that it's easy for this sort of dependency to creep in unnoticed. Not only is a complete ground-up restart of all of your machines hopefully a rare event but testing for this sort of thing is difficult to do, especially for machines in the middle of the startup order (where they depend on some other machines but not everything). (You can always do a testing ground-up restart of everything, but this is sufficiently disruptive that you're probably not going get to do it very often.) The interesting case that we found recently was machines that try to
set their time on startup with (For bonus fun, what actually timed out on the console server was
|
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |