2018-11-21
Some views on more flexible (Illumos) kernel crash dumps
In my notes about Illumos kernel crash dumps, I mentioned that we've now turned them off on our OmniOS fileservers. One of the reasons for this is that we're running an unsupported version of OmniOS, including the kernel. But even if we were running the latest OmniOS CE and had commercial support, we'd do the same thing (at least by default, outside of special circumstances). The core problem is that our needs conflict with what Illumos crash dumps want to give us right now.
The current implementation of kernel crash dumps basically prioritizes
capturing complete information. There are various manifestations
of this in the implementation, starting with how it assumes that
if crash dumps are configured at all, you have set up enough disk
space to hold the full crash dump level you've set in dumpadm
, so it's sensible to not bother
checking if the dump will fit and treating failure to fit as an
unusual situation that is not worth doing much special about. Another
one is the missing feature that there is no overall time limit on
how long the crash dump will run, which is perfectly sensible if the
most important thing is to capture the crash dump for diagnosis.
But, well, the most important thing is not always to capture complete diagnostic information. Sometimes you need to get things back into service before too long, so what you really want is to capture as much information as possible while still returning to service in a certain amount of time. Sometimes you only have so much disk space available for crash dumps, and you would like to capture whatever information can fit in that disk space, and if not everything fits it would be nice if the most important things were definitely captured.
All of this makes me wish that Illumos kernel crash dumps wrote certain critical information immediately, at the start of the crash dump, and then progressively extended the information in the crash dump until they either ran out of space or ran out of time. What do I consider critical? My first approximation would be, in order, the kernel panic, the recent kernel messages, the kernel stack of the panicing kernel process, and the kernel stacks of all processes. Probably you'd also want anything recent from the kernel fault manager.
The current Illumos crash dump code does have an order for what gets written out, and it does put some of this stuff into the dump header, but as far as I can tell the dump header only gets written at the end. It's possible that you could create a version of this incremental dump approach by simply writing out the incomplete dump header every so often (appropriately marked with how it's incomplete). There's also a 'dump summary' that gets written at the end that appears to contain a bunch of this information; perhaps a preliminary copy could be written at the start of the dump, then overwritten at the end if the dump is complete. Generally what seems to take all the time (and space) with our dumps is the main page writing stuff, not a bunch of preliminary stuff, so I think Illumos could definitely write at least one chunk of useful information before it bogs down. And if this needs extra space in the dump device, I would gladly sacrifice a few megabytes to have such useful information always present.
(It appears that the Illumos kernel already keeps a lot of ZFS data
memory out of kernel crash dumps, both for the ARC and for in-flight
ZFS IO, so I'm not sure what memory the kernel is spending all of
its time dumping in our case. Possibly we have a lot of ZFS metadata,
which apparently does go into crash dumps. See the comments about
crash dumps in abd.c
and zio.c.
For the 'dump summary', see the dump_summary
, dump_ereports
,
and dump_messages
functions in dumpsubr.c.)
PS: All of this is sort of wishing from the sidelines, since our future is not with Illumos.
What I really miss when I don't have X across the network
For reasons beyond the scope of this entry, I spent a couple of
days last week working from home. One of the big differences when
I do this is that I don't have remote X; instead I wind up doing
everything over SSH. At a nominal level the experience is much the
same, partly because I've deliberately arranged it that way; using
sshterm
to start a SSH session to a
host is very similar to using rxterm
to start an xterm
on it, for example. But at a deeper level there
are two things I wound up really missing.
The obvious thing I missed was exmh
,
which is the core of how I efficiently deal with email at work.
Exmh is text based so it works well within the limitations of
modern X network transparency; at
work I run it on one of our login servers, with direct access to
my email, and it displays on my desktop. In theory the modern
replacement for exmh
and this style of working would be a local
IMAP mail client, if I could find a Linux one that I liked.
(I mean, apart from the whole thing where I'm extremely attached to (N)MH and don't want to move to IMAP any sooner than I have to. An alternate approach would be to find and set up some good text-mode MH visual client, probably GNU Emacs' MH-E, which I used to use years ago.)
But the surprising subtle thing that I wound up missing was the
ability to open up a new xterm
on the remote machine from within
my current session. While starting an xterm
this way obviously
skips logging in, the real great advantage of doing this is that
the new xterm
completely inherits my current context, both my
current directory and my current privileges (if I'm su
'd to root,
for example, which is when this is especially handy). It is in a
way the Unix shell session equivalent of a browser's 'Open in New
Tab/Window', and it's useful for much the same reasons; it gives
you an additional view on what you're currently doing or about to
do.
There is no good replacement for this that I can see outside of remote X or something very similar to it. You can't get it with job control and you can't really get it with screen or tmux, and a remote windowing protocol that deals with entire desktops instead of individual windows would create a completely different environment in general. This makes me sad that in the brave future world of Wayland, there still doesn't seem to be much prospect of remote windows.
(This entry is sort of prompted by reading The X Network Transparency Myth.)
PS: If you want, you can consider this the flipside of my entry X's network transparency has wound up mostly being a failure. X's network transparency is not anywhere near complete, but within the domain of mostly text-focused programs running over 1G LANs it can still deliver very nice benefits. I take advantage of them every day that I'm at work, and miss them when I'm not.