Some views on more flexible (Illumos) kernel crash dumps

November 21, 2018

In my notes about Illumos kernel crash dumps, I mentioned that we've now turned them off on our OmniOS fileservers. One of the reasons for this is that we're running an unsupported version of OmniOS, including the kernel. But even if we were running the latest OmniOS CE and had commercial support, we'd do the same thing (at least by default, outside of special circumstances). The core problem is that our needs conflict with what Illumos crash dumps want to give us right now.

The current implementation of kernel crash dumps basically prioritizes capturing complete information. There are various manifestations of this in the implementation, starting with how it assumes that if crash dumps are configured at all, you have set up enough disk space to hold the full crash dump level you've set in dumpadm, so it's sensible to not bother checking if the dump will fit and treating failure to fit as an unusual situation that is not worth doing much special about. Another one is the missing feature that there is no overall time limit on how long the crash dump will run, which is perfectly sensible if the most important thing is to capture the crash dump for diagnosis.

But, well, the most important thing is not always to capture complete diagnostic information. Sometimes you need to get things back into service before too long, so what you really want is to capture as much information as possible while still returning to service in a certain amount of time. Sometimes you only have so much disk space available for crash dumps, and you would like to capture whatever information can fit in that disk space, and if not everything fits it would be nice if the most important things were definitely captured.

All of this makes me wish that Illumos kernel crash dumps wrote certain critical information immediately, at the start of the crash dump, and then progressively extended the information in the crash dump until they either ran out of space or ran out of time. What do I consider critical? My first approximation would be, in order, the kernel panic, the recent kernel messages, the kernel stack of the panicing kernel process, and the kernel stacks of all processes. Probably you'd also want anything recent from the kernel fault manager.

The current Illumos crash dump code does have an order for what gets written out, and it does put some of this stuff into the dump header, but as far as I can tell the dump header only gets written at the end. It's possible that you could create a version of this incremental dump approach by simply writing out the incomplete dump header every so often (appropriately marked with how it's incomplete). There's also a 'dump summary' that gets written at the end that appears to contain a bunch of this information; perhaps a preliminary copy could be written at the start of the dump, then overwritten at the end if the dump is complete. Generally what seems to take all the time (and space) with our dumps is the main page writing stuff, not a bunch of preliminary stuff, so I think Illumos could definitely write at least one chunk of useful information before it bogs down. And if this needs extra space in the dump device, I would gladly sacrifice a few megabytes to have such useful information always present.

(It appears that the Illumos kernel already keeps a lot of ZFS data memory out of kernel crash dumps, both for the ARC and for in-flight ZFS IO, so I'm not sure what memory the kernel is spending all of its time dumping in our case. Possibly we have a lot of ZFS metadata, which apparently does go into crash dumps. See the comments about crash dumps in abd.c and zio.c. For the 'dump summary', see the dump_summary, dump_ereports, and dump_messages functions in dumpsubr.c.)

PS: All of this is sort of wishing from the sidelines, since our future is not with Illumos.


Comments on this page:

Interesting

Written on 21 November 2018.
« What I really miss when I don't have X across the network
Qualified praise for the Linux ss program »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Wed Nov 21 23:46:22 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.