How to accidentally reboot a server
This is a little war story.
To start out, determine that you need to add more memory to one of your login servers. This requires taking it down, which requires getting all of its users to log off first. Then go through the following steps:
- Keep new logins off the system by changing your automated password
management system to give all non-staff accounts a login shell
that just prints out a 'this system is under maintenance, please
use another one' and then quits.
(Otherwise people will keep logging in to the server and you'll never, ever get a moment where there is no one on. Especially during working hours.)
- Mail all of the current users to ask them to log off.
- Get email from one user saying 'you can kill all of my processes
on the machine'.
- Do this in the obvious way. From an existing root session:
/bin/su user
kill -1 -1
- Have all of your (staff) sessions to the machine suddenly disconnect. Get a sinking feeling.
Perhaps you can spot the mistake already. The mistake is that the su
to the user did not actually work. The user had a locked login shell, so
all the su
did was print a message and then dump me back into the root
shell. Then I ran the 'kill -1 -1
' as root and of course it SIGHUP'd
all processes on the machine, effectively rebooting it.
(It didn't actually reboot and in fact enough stayed up that I could
ssh
back in, which surprised me a little bit.)
I should have used '/bin/su user -c "kill -1 -1"
' or in fact one of
the ways we keep around to do 'run command as user no matter what their
shell is'. But I didn't take the time to do either of them.
(On the 'good' side, we got to immediately add more memory to that server.)
|
|