How to force a crash dump on Solaris 10 x86

November 14, 2008

Because SPARCs have OpenBoot, forcing Solaris SPARC to panic and do a crash dump is generally a pretty simple process. Because x86 hardware has no PROM environment (despite Sun trying to pretend otherwise), forcing a crash dump on Solaris 10 x86 is a little bit more intricate.

Crash dumps are done through the kernel debugger, which has to be loaded ahead of time from and on the system console. (Technically I think that you can load it on a serial connection, should you have one that is not the system console.)

You load the kernel debugger by running the command 'mdb -K' as root. This immediately drops you into the kernel debugger, halting the rest of the kernel and the system, so when you see the debugger prompt you want to use the ':c' command to continue everything. Once the kernel debugger is loaded, you can (theoretically) break into it at any time with either F1-A or Shift-Pause.

(In a stroke of what I can only describe as Sun's typical brilliance, neither key sequence can be set as a hotkey in the SunFire X2100 and X2200 ILOM KVM-over-IP environment. F1-A does work if your local machine will pass it to the ILOM console application.)

Once you are in the kernel debugger, the command to crash the system is:

$<systemdump

If all goes well, the crash dump will appear in /var/crash/<hostname>/ as usual after the machine reboots.

When you are done with the kernel debugger (for example, the system works fine during testing instead of crashing), you can and should unload it again by running 'mdb -U'. Among other things, this makes F1-A and Shift-Pause on the console not dangerous any more.

Mdb, including the kernel debugging stuff, is mostly documented in the Solaris Modular Debugger Guide and in the mdb and kmdb manpages. Note that the SMDG has not been updated for recent updates of Solaris 10 (for example, its instructions on how to start the kernel debugger on boot are for the old x86 boot environment, not the new one).

If the machine is still running normally, you have two additional options:

  • 'reboot -d' will force a crash dump before/while rebooting.
    (However, speaking from personal experience it is possible to get a Solaris 10 system into such a state that it cannot reboot, although it's still running relatively normally.)

  • 'savecore -L' will take a 'crash dump' of a live system without interrupting it, provided that you have configured a dedicated dump device with dumpadm. You'll want the system to be as quiet as possible, and even then you may not get a usable dump.

(This is one of those entries I write to have this information in an easily accessible place that I can remember.)

Written on 14 November 2008.
« What the members of a Unicode conversion error object are
Getting Python's encoding and decoding straight »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Nov 14 01:27:41 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.