2014-03-20
Killing (almost) all processes on Linux is not recoverable
Suppose that you have at least a semi-hung system that you're taking
drastic measures to get at least semi-alive again; for example, you
might use Magic Sysrq's option to send a SIGTERM
or SIGKILL
to all processes except init ('e' or 'i', per here). If you do this,
it's quite possible that your system will stagger dazedly around for a
bit and then seem to come back to life. Oh, sure, maybe you need to
restart a few daemons, but it can easily look like you can keep going
without actually rebooting the machine. You can, right?
Based on painful experience, let me answer the question simply: no.
In practice there is no even vaguely easy way to recover a modern Linux system to full functionality after you've killed almost all processes. You can get something back that looks like it's working, but what you really have is a partial zombie. You can spend quite literally months finding things in the corners that are not working; if you're lucky, they will be not working in some noisy way and diagnosing them will be obvious. It's quite possible to not be lucky.
So if you are ever in a situation like this with Magic Sysrq or the like, reboot your system after using drastic actions to wake it up even if it seems okay afterwards. Things like Sysrq-e and Sysrq-i are for temporary diagnostics (to answer questions like 'is this hang probably because of a user-level process doing bad things'), not for cures. The cure is a reboot.
Another way to do this is an accidental 'kill -SIGNAL -1
' for some
signal that your init
ignores. As an interesting example, it appears
that systemd ignores SIGHUP so the traditional accidental 'kill -1
-1
' as root might do this on a
systemd system. After something like this your system may look fine,
especially after you restart some daemons, but it is not. Reboot.
Really. It's simpler and much less painful over the long run and
you're going to wind up doing it sooner or later anyways.
PS: as I found out in the same incident, immediately turn up the log level when using Magic Sysrq.