Wandering Thoughts archives

2018-11-18

Old zombie Linux distribution versions aren't really doing you any favours

One bit of recent news in the Linux distribution world is Mark Shuttleworth's recent announcement that Ubuntu 18.04 LTS will get ten years of support (Slashdot, ServerWatch). As it happens, I have some views on this. First, before people start getting too excited, note that Shuttleworth hasn't said anything about what form this support will take, especially whether or not you'll have to pay for it. My own guess is that Canonical will be expanding their current paid Ubuntu 12.04 ESM (Extended Security Maintenance) to also cover 18.04 and apparently 16.04. This wouldn't be terribly surprising, since back in September they expanded it to cover 14.04.

More broadly, I've come to feel that keeping on running really old versions of distributions is generally not helping you, even if they have support. After a certain point, old distribution versions are basically zombies; they shamble on and they sort of look alive, but they are not because time has moved past them. Their software versions are out of date and increasingly will lack features that you actively want, and even if you try to build your own versions of things, a steadily increasing number of programs just won't build on the versions of libraries, kernels, and so on that those old Linuxes have. Upgrading from very old versions is also an increasing problem as time goes by; often, so much has changed that what you do is less upgrading and more rebuilding the same functionality from scratch on a new, more modern base.

(Here I'm not just talking about the actual systems; I'm talking about things like configuration files for programs. You can often carry configuration files forward with only modest changes even if you reinstall systems from scratch, but that only works so far.)

You can run such zombie systems for a long time, but they have to essentially be closed and frozen appliances, where absolutely nothing on them needs to change. This is very hard to do on systems that are exposed directly or indirectly to the Internet, because Internet software decays and must be actively maintained. Even if you don't have systems that are exposed this way, you may find that you end up wanting to put new software on them, for example a metrics and monitoring system, except that your old systems are too old for that to work well (or perhaps at all).

(Beyond software you want to put on such old systems, you're also missing out on an increasing number of new features of things. Some of the time these are features that you could actively use and that will improve your life when you can finally deploy them and use them. I know it sounds crazy, but software on Linux really does improve over time in a lot of cases.)

Having run and used a certain number of ancient systems in my time (and we're running some now), my view is that I now want to avoid doing it if I can. I don't know what the exact boundary is for Linux today (and anyway it varies depending on what you're using the system for), but I think getting towards ten years is definitely too long. An eight year old Linux system is going to be painfully out of date on pretty much everything, and no one is going to be sympathetic about it.

So, even if Ubuntu 18.04 had ten years of free support (or at least security updates), I'm pretty certain that neither you nor we really want to be taking advantage of that. At least not for those full ten years. Periodic Linux distribution version updates may be somewhat of a pain at the time, but overall they're good for us.

linux/ZombieDistroVersions written at 22:43:43; Add Comment

Some notes about kernel crash dumps in Illumos

I tweeted:

On our OmniOS servers, we should probably turn off writing kernel crash dumps on panics. It takes far too long, it usually doesn't succeed, and even if it did the information isn't useful to us in practice (we're using a very outdated version & we're frozen on it).

We're already only saving kernel pages, which is the minimum setting in dumpadm, but our fileservers still take at least an hour+ to write dumps. On a panic, we need them back in service in minutes (as few as possible).

The resulting Twitter discussion got me to take a look into the current state of the code for this in Illumos, and I wound up discovering some potentially interesting things. First off, dump settings are not auto-loaded or auto-saved by the kernel in some magical way; instead dumpadm saves all of your configuration settings in /etc/dumpadm.conf and then sets them during boot through svc:/system/dumpadm:default. The dumpadm manual page will tell you all of this if you read its description of the -u argument.

Next, the -z argument to dumpadm is inadequately described in the manual page. The 'crash dump compression' it's talking about is whether savecore will write compressed dumps; it has nothing to do with how the kernel writes out the crash dump to your configured crash device. In fact, dumpadm has no direct control over basically any of that process; if you want to change things about the kernel dump process, you need to set kernel variables through /etc/system (or 'mdb -k').

The kernel writes crash dumps in multiple steps. If your console shows the message 'dumping to <something>, offset NNN, contents: <...>', then you've at least reached the start of writing out the crash dump. If you see updates of the form 'dumping: MM:SS N% done', the kernel has reached the main writeout loop and is writing out pages of memory, perhaps excessively slowly. As far as I can tell from the code, crash dumps don't abort when they run out of space on the dump device; they keep processing things and just throw all of the work away.

As it turns out, the kernel always compresses memory as it writes it out, although this is obscured by the current state of the code. The short version is that unless you set non-default system parameters that you probably don't want to, current Illumos systems will always do single threaded lzjb compression of memory (where the CPU that is writing out the crash dump also compresses the buffers before writing). Although you can change things to do dumps with multi-threaded compression using either lzjb or bzip2, you probably don't want to, because the multi-threaded code has been deliberately disabled and is going to be removed sometime. See Illumos issue 3314 and the related Illumos issue 1369.

(As a corollary of kernel panic dumps always compressing with at least ljzb, you probably should not have compression turned on on your dump zvol (which I believe is the default).)

I'm far from convinced that single threaded lzjb compression can reach and sustain the full write speed of our system SSDs on our relatively slow CPUs, especially during a crash dump (when I believe there's relatively little write buffering going on), although for obvious reasons it's hard to test. People with NVMe drives might have problems even with modern fast hardware.

If you examine the source of dumpsubr.c, you'll discover a tempting variable dump_timeout that's set to 120 (seconds) and described as 'timeout for dumping pages'. This comment is a little bit misleading, as usual; what it really means is 'timeout for dumping a single set of pages'. There is no limit on how long the kernel is willing to keep writing out pages for, provided that it makes enough progress within 120 seconds. In our case this is unfortunate, since we'd be willing to spend a few minutes to gather a bit of crash information but not anything like what a kernel dump appears to take on our machines.

(The good news is that if you run out of space on your dump device, the dump code is at least smart enough to not spend any more time trying to compress pages; it just throws them away right away. You might run out of space because you're taking a panic dump from a ZFS fileserver with 128 GB of RAM and putting it on an 8GB dump zvol that is part of a rpool that lives on 80 GB SSDs, where a full-sized kernel dump almost certainly can't even be saved by savecore.)

PS: To see that the default is still a single-threaded crash dump, you need to chase through the code to dumphdr.h and the various DUMP_PLAT_*_MINCPU definitions, all of which are set to 0. Due to how the code is structured, this disables multi-threaded dumps entirely.

Sidebar: The theoretical controls for multi-threaded dumps

If you set dump_plat_mincpu to something above 0, then if you have 'sufficiently more' CPUs than this, you will get parallel bzip2 compression; below that you will get parallel lzjb. Since parallel compression is disabled by default in Illumos, this may or may not actually still work, even if you don't run into any actual bugs of the sort that caused it to be disabled in the first place. Note that bzip2 is not fast.

The actual threshold of 'enough' depends on the claimed maximum transfer size of your disks. For dumping to zvols, it appears that this maximum transfer size is always 128 KB, which uses a code path where the breakpoint between parallel lzjb and parallel bzip2 is just dump_plat_mincpu; if you have that many CPUs or more, you get bzip2. This implies that you may want to set dump_plat_mincpu to a nice high number so that you get parallel lzjb all the time.

solaris/IllumosCrashDumpNotes written at 01:37:56; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.