Wandering Thoughts archives

2015-07-31

Ubuntu once again fails at a good kernel security update announcement

Ubuntu just sent out USN-2700-1, a 14.04 LTS announcement about a kernel update for CVE-2015-3290, CVE-2015-3291, and CVE-2015-5157. People with good memories may at this point remember USN-2688-1, a 14.04 LTS announcement about a kernel update for CVE-2015-3290, CVE-2015-1333, CVE-2015-3291, and CVE-2015-5157. Gosh, that's a familiar list of CVEs, and it sort of looks like the 'repeated CVEs' thing Ubuntu has done before. If you already applied the USN-2688-1 kernel and rebooted everything, it certainly sounds like you can skip USN-2700-1.

That would be a mistake. What Ubuntu is not bothering to mention in USN-2700-1 is that the 64-bit x86 kernels from USN-2688-1 had a bad bug. In that kernel, if a 32-bit program forks and then execs a 64-bit program the 64-bit program segfaults on startup; for example, a 32-bit shell will be unable to run any 64-bit programs (which will be most of them). This bug is the sole reason USN-2700-1 was issued (literally).

The USN-2700-1 text should come with a prominent notification to the effect of 'the previous update introduced a serious bug on 64-bit systems; we are re-issuing corrected kernels without this problem'. Ubuntu has put such notices on updates in the past so the idea is not foreign to them; they just didn't bother doing it this time around. As a result, people who may be affected by this newly introduced kernel bug do not necessarily know that this is their problem and they should update to the USN-2700-1 kernel to fix it.

(At best they may start doing a launchpad bug search and find the bug report. But I don't think it's necessarily all that likely, because the bug's title is not particularly accurate about what it actually is; 'Segfault in ld-2.19.so while starting Steam after upgrade to 3.13.0-59.98' does not point clearly to a 32-bit on 64-bit issue. It doesn't even mention 'on 64-bit platforms' in the description.)

Kernel update notices matter because people use them to decide whether or not to go through the hassle of a system reboot. If a notice is misleading, this goes wrong; people don't update and reboot when they really should. When there are bugs in a kernel update, as there were here, not telling people about them causes them to try to troubleshoot a buggy system without realizing that there is a simple solution.

(Lucky people noticed failures on the USN-2688-1 kernel right away, and so were able to attribute them to the just-done kernel update. But unlucky people will only run into this once in a while, when they run a rare lingering 32-bit program that does this, and so they may not immediately realize that it was due to a kernel update that might now be a week or two in the past.)

(See also a previous Ubuntu kernel update failure, from 2011.)

UbuntuKernelUpdateNoticeFail written at 00:39:59; Add Comment

2015-07-24

Fedora 22's problem with my scroll wheel

Shortly after I upgraded to Fedora 22, I noticed that my scroll wheel was, for lack of a better description, 'stuttering' in some applications. I'd roll it in one direction and instead of scrolling smoothly, what the application was displaying would jerk around all over, both up and down. It didn't happen all of the time and fortunately it didn't happen in any of my main applications, but it happened often enough to be frustrating. As far as I can tell, this mostly happened in native Fedora GTK3 based applications. I saw it clearly in Evince and the stock Fedora Firefox that I sometimes use, but I think I saw it in a few other applications as well.

I don't know exactly what causes this, but I have managed to find a workaround. Running affected programs with the magic environment variable GDK_CORE_DEVICE_EVENTS set to '1' has made the problem go away (for me, so far). There are some Fedora and other bugs that are suggestive of this, such as Fedora bug #1226465, and that bug leads to an excellent KDE explanation of that specific GTK3 behavior. Since this Fedora bug is about scroll events going missing instead of scrolling things back and forth, it may not be exactly my issue.

(My issue is also definitely not fixed in the GTK3 update that supposedly fixes it for other people. On the other hand, updates for KDE and lightdm now appear to be setting GDK_CORE_DEVICE_EVENTS, so who knows what's going on here.)

Since this environment variable suppresses the bad behavior with no visible side effects I've seen, my current solution is to set it for my entire session. I haven't bothered reporting a Fedora bug for this so far because I use a very variant window manager and that seems likely to be a recipe for more argument than anything else. Perhaps I am too cynical.

(The issue is very reproduceable for me; all I have to do is start Evince with that environment variable scrubbed out and my scroll wheel makes things jump around nicely again.)

Sidebar: Additional links

There is this Firefox bug, especially comment 9, and this X server patch from 2013. You'd think a patch from 2013 would be incorporated by now, but who knows.

Fedora22ScrollWheelProblem written at 00:53:59; Add Comment

2015-07-22

Some thoughts on log rolling with date extensions

For a long time everyone renamed old logs in the same way; the most recent log got a .0 on the end, the next most recent got a .1 on the end, and so on. About the only confusion between systems was that some started from .0 and some from .1, and also whether or not your logs got gzip'd. These days, the Red Hat and Fedora derived Linuxes have switched to lograte's dateext setting, where the extension that old logs get is date based, generally in the format -YYYYMMDD. I'm not entirely sure how I feel about this so far and not just because it changes what I'm used to.

On the good side, this means that a rolled log has the same file name for as long as it exists. If I look at allmessages-20150718 today, I know that I can come back tomorrow or next week and find it with the same name; I don't have to remember that what was allmessages.3 today is allmessages.4 tomorrow (or next week). It also means that logs sort lexically in time order, which is not the case with numbered logs; .10 is lexically between .1 and .2, but is nowhere near them in time.

(The lexical order is also forward time order instead of reverse time order, which means that if you grep everything you get it in forward time order instead of things jumping around.)

On the down side, rolled logs having a date extension means that I can no longer look at the most recently rolled log just by using <name>.0 (or .1); instead I need to look at what log files there are (this is especially the case with logs that are rolled weekly). It also means that I lose the idiom of grep'ing or whatever through <name>.[0-6] to look through the last week's worth of logs; again, I need to look at the actual filenames or at least resort to something like 'grep ... $(/bin/ls -1t <name>.* | sed 7q)' (and I can do that with any log naming scheme).

I'm sure that Red Hat had its reasons to change the naming scheme around. It certainly makes a certain amount of things consistent and obvious. But on the whole I'm not sure I actually like it or if I'd rather have things be the old fashioned way that Ubuntu and others still follow.

(I don't care enough about this to change my Fedora machines or our CentOS 6 and 7 servers.)

LogrollingDateExtThoughts written at 01:21:45; Add Comment

2015-07-12

A Linux software RAID message flood

It's early Sunday morning, which is the time when Fedora kicks off its weekly checks of MD arrays. I know this time extremely well, not because it causes visible IO load but because I actually watch my kernel messages and software RAID floods them during this check. And by that I mean that roughly once every five seconds I can look forward to a burst of messages like this:

md: delaying data-check of md51 until md53 has finished (they share one or more physical units)
md: delaying data-check of md50 until md53 has finished (they share one or more physical units)

I was initially going to write yet again about kernel messages not being sensibly ratelimited (cf), but once I started looking it appears that this is more of some sort of bug in the code. Only my home machine generates this flood of messages; my work machine, which also has software RAID arrays, generates these messages only at the start or end of a data check. The differences between the machines is that the work machine has fewer mirrored arrays (4 versus 8) and all of its arrays are on one pair of disks (at home, half are on one pair of disks and half on the other pair, because one pair is for system stuff and one pair is for my data).

Mind you, the work machine appears to have some message anomalies as well, but it doesn't emit floods of them. Particularly, take a look at this set of warnings emitted at the start of a data check:

[392229.258284] md: delaying data-check of md14 until md13 has finished (they share one or more physical units)
[392229.374681] md: delaying data-check of md13 until md15 has finished (they share one or more physical units)
[392229.374705] md: delaying data-check of md14 until md13 has finished (they share one or more physical units)
[392229.376020] md: delaying data-check of md14 until md13 has finished (they share one or more physical units)
[392229.376022] md: delaying data-check of md13 until md15 has finished (they share one or more physical units)

That's some interesting stuttering going on there, and also some recursion (md15 is the array that just started being checked).

As I was writing this entry, it took long enough that I've now discovered that this is wrong; my work machine sees this sort of thing too. When the md15 data check finished and the md13 one then started, my work machine also started repeating messages about md14 being delayed every few seconds. Fortunately md13 is small so this didn't last long. Something is clearly screwing up here.

(All of this means I should figure out how to send a bug report or two to the software RAID people.)

On the good news front, now that I've looked into this I've discovered that the number of concurrent data checks is controlled by the MAXCONCURRENT setting in /etc/sysconfig/raid-check. Setting this to '1' is not quite correct for my home machine (which could check one array on each pair of disks at the same time), but it does make the messages go away. I'll live with a somewhat more extended data check time in order to not have a message flood.

(I don't know if anyone besides Fedora and RHEL/CentOS do this periodic software RAID check, but if your system does and you see a message flood, you might want to look for a similar setting.)

SoftwareRaidMessageFlood written at 01:41:05; Add Comment

2015-07-08

Making GTK applications in alternate locations, settings edition

Suppose, not entirely hypothetically, that you build your own copy of some moderately complicated GTK application and install it a non-standard location with the equivalent of 'configure --prefix /some/where'. In theory you can then just run the application as /some/where/bin/prog and everything will be great. As I found out today, in practice you may have a subtle problem (or if you're unlucky, a not so subtle one).

GTK applications use a system for handling their settings and preferences called 'GSettings', which uses XML schema files to define all of this stuff for a particular application. By default, an application only looks for its settings schema in the system directories, even if it knows full well that its settings schema file was actually installed elsewhere. If you don't have any version of your application installed as part of the system, your own version will probably fail because it can't find its schema at all. If you do have some system version of your application installed, your own version is probably actually using the settings file from that version.

To fix this, you need to set the $XDG_DATA_DIRS environment variable so it includes the right directory for your application. If you configured with a prefix of /some/where, your schema files get put into /some/where/share/glib-2.0/schemas and $XDG_DATA_DIRS must include /some/where/share. According to the XDG Base Directory Specification, the default value for it is /usr/local/share/:/usr/share (and this agrees with what I see in testing). So you want a cover script that does this:

export XDG_DATA_DIRS=/some/where/share:/usr/local/share:/usr/share
exec /some/where/bin/prog "$@"

(This behavior is documented in the glib-compile-schemas manpage.)

There may be other things that GTK applications only look for in system areas, but if so I haven't tripped across them yet for the couple of GTK based applications I compile myself.

(Before you're tempted to throw stones at GTK for not fully supporting installing and running applications from other prefixes, note that KDE is worse for this as far as I know. I had to resort to building and installing RPMs for my locally compiled KDE application. Maybe there's a magic environment variable for it too that I just didn't stumble over.)

GTKWithAltLocation written at 01:38:44; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.