All your servers should have Linux's magic SysRq enabled

May 10, 2012

This is effectively another lesson learned from our recent building power shutdown. I will put it simply:

All of your servers should have magic SysRq enabled.

There are reasons to not do this on client machines (but not necessarily very good ones), but none on your servers (which certainly should have their hardware and consoles in a secure location).

What magic SysRq is good for on servers (above everything else) is giving you a last ditch chance to shut down or reboot the machine in something approaching an orderly way. I'm not just talking about if the system goes crazy, because it's also quite possible for ordinary system shutdowns to hang, especially if you're shutting down a group of systems that have complex NFS filesystem relationships and something went down out of order. If this happens and you don't have magic SysRq support available, you're plain out of luck; all you can do is pull the power and hope that nothing is going to explode because it hasn't been killed, had its data synced to disk, or whatever.

With magic SysRq you have at least a chance of doing something about this. You can force a kernel level sync, a kernel level unmount of as many filesystems as possible, and even hit processes with signals if you think it's going to do any good. And then you can reboot the machine (and afterwards, possibly pull the power to keep the machine down).

PS: you should explicitly enabled magic SysRq in your standard server install setup, even if your distribution normally defaults to leaving it on; distribution defaults can change over time. Also, note that if you have a serial console you generally need a getty listening on it in order to make magic SysRq work.

(You can check to see if magic SysRq is enabled by looking at the value of /proc/sys/kernel/sysrq; a 1 means that it is, a 0 means that it isn't.)


Comments on this page:

From 66.175.95.4 at 2012-05-10 20:15:22:

CentOS, SuSE et al. seem to have it disabled by default, which is annoying when you find a server has locked you out and you find you can't raise elephants any more.

From 71.197.244.244 at 2012-05-13 01:57:04:

@66.175.95.4:

A while ago SuSE/openSUSE changed the default to off and moved it into the the miscellaneous section of "Security Center & Hardening". I understand the logic, but it's definitely frustrating. The upside of it being integrated into their security configs is that I believe it can now be set in an AutoYast config file if you're doing unattended installs via that route. Hopefully Kickstart & such work the same.

Kate

Written on 10 May 2012.
« Using rsync to pull a directory tree to client machines
The death of paging on the web »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu May 10 16:28:49 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.