Fedora 22's problem with my scroll wheel
Shortly after I upgraded to Fedora 22, I noticed that my scroll wheel was, for lack of a better description, 'stuttering' in some applications. I'd roll it in one direction and instead of scrolling smoothly, what the application was displaying would jerk around all over, both up and down. It didn't happen all of the time and fortunately it didn't happen in any of my main applications, but it happened often enough to be frustrating. As far as I can tell, this mostly happened in native Fedora GTK3 based applications. I saw it clearly in Evince and the stock Fedora Firefox that I sometimes use, but I think I saw it in a few other applications as well.
I don't know exactly what causes this, but I have managed to find
a workaround. Running affected programs with the magic environment
GDK_CORE_DEVICE_EVENTS set to '
1' has made the problem
go away (for me, so far). There are some Fedora and other bugs that
are suggestive of this, such as Fedora bug #1226465, and that bug
leads to an excellent KDE explanation of that specific GTK3
this Fedora bug is about scroll events going missing instead of
scrolling things back and forth, it may not be exactly my issue.
(My issue is also definitely not fixed in the GTK3 update that
supposedly fixes it for other people. On the other hand, updates
now appear to be setting
GDK_CORE_DEVICE_EVENTS, so who knows
what's going on here.)
Since this environment variable suppresses the bad behavior with no visible side effects I've seen, my current solution is to set it for my entire session. I haven't bothered reporting a Fedora bug for this so far because I use a very variant window manager and that seems likely to be a recipe for more argument than anything else. Perhaps I am too cynical.
(The issue is very reproduceable for me; all I have to do is start Evince with that environment variable scrubbed out and my scroll wheel makes things jump around nicely again.)
Sidebar: Additional links
A modest little change I'd like to see in bug reporting systems
It is my opinion that sometimes little elements of wording and culture matter. One of those little elements of culture that has been nagging at me lately is the specifics of how Bugzilla and probably other bug reporting systems deal with duplicate bug reports; they are set to 'closed as a duplicate of <other bug>'.
On the one hand, this is perfectly accurate. On the other hand, almost all of the time one of my bug reports is closed out this way I wind up feeling like I shouldn't have filed it at all, because I should have been sufficiently industrious to find the original bug report. I suspect that I am not alone in feeling this way in this situation. I further suspect that feeling this way serves as a quiet little disincentive to file bug reports; after all, it might be yet another duplicate.
Now, some projects certainly seem to not want bug reports in the first place. And probably some projects get enough duplicate bug reports that they want to apply pressure against them, especially against people who do it frequently (although I suspect that this isn't entirely going to work). But I suspect that this is not a globally desirable thing.
As a result, what I'd like to see bug reporting systems try out is simply renaming this status to the more neutral 'merged with <other bug>'.
Would it make any real difference? I honestly don't know; little cultural hacks are hard to predict. But I don't think it would hurt and who knows, something interesting could happen.
(In my view, 'closed as duplicate' is the kind of thing that makes perfect sense when your bug reporting system is an internal one fed by QA people who are paid to do this sort of stuff efficiently and accurately. In that situation, duplicate bugs often are someone kind of falling down on the job. But this is not the state of affairs with public bug reporting systems, where you are lucky if people even bother to jump through your hoops to file at all.)
Some thoughts on log rolling with date extensions
For a long time everyone renamed old logs in the same way; the most
recent log got a
.0 on the end, the next most recent got a
on the end, and so on. About the only confusion between systems was
that some started from
.0 and some from
.1, and also whether
or not your logs got gzip'd. These days, the Red Hat and Fedora
derived Linuxes have switched to lograte's
dateext setting, where
the extension that old logs get is date based, generally in the
format -YYYYMMDD. I'm not entirely sure how I feel about this so
far and not just because it changes what I'm used to.
On the good side, this means that a rolled log has the same file name
for as long as it exists. If I look at
allmessages-20150718 today, I
know that I can come back tomorrow or next week and find it with the
same name; I don't have to remember that what was
allmessages.4 tomorrow (or next week). It also means that logs sort
lexically in time order, which is not the case with numbered logs;
is lexically between
.2, but is nowhere near them in time.
(The lexical order is also forward time order instead of reverse time order, which means that if you grep everything you get it in forward time order instead of things jumping around.)
On the down side, rolled logs having a date extension means that I
can no longer look at the most recently rolled log just by using
<name>.0 (or .1); instead I need to look at what log files there
are (this is especially the case with logs that are rolled weekly).
It also means that I lose the idiom of grep'ing or whatever through
<name>.[0-6] to look through the last week's worth of logs; again,
I need to look at the actual filenames or at least resort to something
grep ... $(/bin/ls -1t <name>.* | sed 7q)' (and I can do
that with any log naming scheme).
I'm sure that Red Hat had its reasons to change the naming scheme around. It certainly makes a certain amount of things consistent and obvious. But on the whole I'm not sure I actually like it or if I'd rather have things be the old fashioned way that Ubuntu and others still follow.
(I don't care enough about this to change my Fedora machines or our CentOS 6 and 7 servers.)
My brush with the increasing pervasiveness of smartphone GPS mapping
One of the things I do with my time is go bicycling with a local bike club. When you go on group bike rides, one of the things you generally want to have is directions for where the ride is going (if only to reassure yourself if you get separated from the group). When I started with the club back in 2006, these 'cue sheets' for rides were entirely a paper thing and entirely offline; you turned up at the start of the ride and the ride leader handed out a bunch of copies to anyone who wanted or needed one.
(By 2006 I believe that people were mostly creating new cue sheets in word processors and other tools, but some old ones existed only in scanned form that had been passed down through the years.)
Time rolled on and smartphones with GPS appeared. Various early adapters around the club started using smartphone apps to record their rides. People put these ride recordings online and other people started learning from them, spotting interesting new ways to get places and so on. Other people started taking these GPS traces and loading them on their own smartphones (and sometimes GPS devices) as informal guides to the route to supplement the official cue sheets. As time went on, some people started augmenting the normal online ride descriptions for upcoming rides with somewhat informal links to online GPS-based maps of the ride route.
Last year the club started a big push to put copies of the cue sheets online, and alongside the cue sheets it started digitizing many of the routes into GPS route files. For some of the rides, the GPS route files started being the primary authority for the ride's route; the printed cue sheet that the ride leader handed out at the start was generated from them. Finally, this year the club is really pushing people to print their own cue sheets instead of having the ride leader give them out at the start. It's not really hard to see why; even last year fewer and fewer people were asking for copies of the cue sheet at the start of rides and more and more people were saying 'I'm good, I've got the GPS information loaded into my smartphone'.
(This year, on the group rides I've lead I could hardly give out more than a handful of cue sheets. And usually not because people had already printed their own.)
It doesn't take much extrapolation to see where this is going. The club is still officially using cue sheets for now, but it's definitely alongside the GPS route files and more and more cue sheets are automatically generated from the GPS route files. It wouldn't surprise me if by five years from now, having a smartphone with good GPS and a route following app was basically necessary to go on our rides. There's various advantages to going to only GPS route files, and smartphones are clearly becoming increasingly pervasive. Just like the club assumes that you have a bike and a helmet and a few other things, we'll assume you have a reasonably capable smartphone too.
(By then it's unlikely to cost more than, say, your helmet.)
In one way there's nothing particularly surprising about this shift; smartphones with GPS have been taking over from manual maps in many areas. But this is a shift that I've seen happen in front of me and that makes it personally novel. Future shock is made real by being a personal experience.
(It also affects me in that I don't currently have a smartphone, so I'm looking at a future where I probably need to get one in order to really keep up with the club.)
The OmniOS kernel can hold major amounts of unused memory for a long time
The Illumos kernel (which means the kernels of OmniOS, SmartOS, and so on) has an oversight which can cause it to hold down a potentially large amount of unused memory in unproductive ways. We discovered this on our most heavily used NFS fileserver; on a server with 128 GB of RAM, over 70 GB of RAM was being held down by the kernel and left idle for an extended time. As you can imagine, this didn't help the ZFS ARC size, which got choked down to 20 GB or so.
The problem is in kmem, the kernel's general memory allocator. Kmem is what is called a slab allocator, which means that it divides kernel memory up into a bunch of arenas for different-sized objects. Like basically all sophisticated allocators, kmem works hard to optimize allocation and deallocation; for instance, it keeps a per-CPU cache of recently freed objects so that in the likely case that you need an object again you can just grab it in a basically lock free way. As part of these optimizations, kmem keeps a cache of fully empty slabs (ones that have no objects allocated out of them) that have been freed up; this means that it can avoid an expensive trip to the kernel page allocator when you next want some more objects from a particular arena.
The problem is that kmem does not bound the size of this cache of fully empty slabs and does not age slabs out of it. As a result, a temporary usage surge can leave a particular arena with a lot of unused objects and slab memory, especially if the objects in question are large. In our case, this happened to the arena for 'generic 128 KB allocations'; we spent a long time with around six in use but 613,033 allocated. Presumably at one time we needed that ~74 GB of 128 KB buffers (probably because of a NFS overload situation), but we certainly didn't any more.
Kmem can be made to free up these unused slabs, but in order to do
so you must put the system under strong memory pressure by abruptly
allocating enough memory to run the system basically out of what
it thinks of as 'free memory'. In our experiments it was important
to do this in one fast action; otherwise the system frees up memory
through less abrupt methods and doesn't resort to what it considers
extreme measures. The simplest way to do this is with Python; look at what '
top' reports as 'free mem'
and then use up a bit more than that in one go.
(You can verify that the full freeing has triggered by using dtrace
to look for calls to
Unfortunately triggering this panic freeing of memory will likely cause your system to stall significantly. When we did it on our production fileserver we saw NFS stall for a significant amount of time, ssh sessions stop for somewhat less time, and for a while the system wasn't even responding to pings. If you have this problem and can't tolerate your system going away for five or ten minutes until things fully recover, well, you're going to need a downtime (and at that point you might as well reboot the machine).
The simple sign that your system may need this is a persistently
high 'Kernel' memory use in
::memstat but a low ZFS
ARC size. We saw 95% or so Kernel but ARC sizes on the order of 20
GB and of course the Kernel amount never shrunk. The more complex
sign is to look for caches in mdb's
::kmastat that have outsized
space usage and a drastic mismatch between buffers in use and buffers
(Note that arenas for small buffers may be suffering from fragmentation instead of or in addition to this.)
I think that this isn't likely to happen on systems where you have user level programs with fluctuating overall memory usages because sooner or later just the natural fluctuation of user level programs is likely to push the system to do this panic freeing of memory. And if you use a lot of memory at the user level, well, that limits how much memory the kernel can ever use so you're probably less likely to get into this situation. Our NFS fileservers are kind of a worse case for this because they have almost nothing running at the user level and certainly nothing that abruptly wants several gigabytes of memory at once.
People who want more technical detail on this can see the illumos developer mailing list thread. Now that it's been raised to the developers, this issue is likely to be fixed at some point but I don't know when. Changes to kernel memory allocators rarely happen very fast.
'Retail' versus 'wholesale' spam
A while back I mentioned that the spam received by my spamtrap SMTP server is boring; it's mostly advanced fee frauds, phishes, and the like. In light of that and that GMail based people keep trying to send me spam, I've been thinking about how one way to split up spam is between what I'll call retail spam and wholesale spam.
Wholesale spam is the high volume emitters, the people who are doing it in enough volume that they have real infrastructure and automation of some sort. These are the 'email marketing' people and the people who wind up on the SBL and so on and so forth. The modern problem for them is that their very volume makes them recognizable and thus blockable. We have DNS blocklists, we have spam feature recognition in filtering systems, and so on and so forth. As a result of this, I think that wholesale spam is a mostly solved problem for most systems.
Retail spam is the small volume and often hand entered stuff. It is people sitting in Internet cafes using stolen webmail credentials to send out more or less hand-written messages. This is the domain of a great deal of advance fee fraud and phish spam, and as a result of its comparatively small volume and hand done nature it's hard to do a really good job of blocking it today. It's probably always going to be hard to fully block this, and as a result I can unhappily look forward to GMail emitting this stuff in my direction for years to come.
(GMail is far from alone here, of course; any freemail service is a sending source for this stuff. I just notice GMail more than the others for various reasons.)
Maybe someday we'll figure out really effective tools against retail spam, but I doubt it. Stopping retail spam runs up against the fundamental problem of spam.
Some data on how long it is between fetches of my Atom feed
Recently I became interested in a relatively simple question: on average, how much time passes between two fetches of the Atom feed for Wandering Thoughts? Today I want to give some preliminary answers to that. To make life simple, I'm looking only at the blog's main feed and I'm taking 24 hours of data over Friday (local time). Excluding feed fetch attempts that are blocked for some reason, I get the following numbers:
- the straight average is one fetch every 12.9 seconds (with a standard deviance of 13.7).
- the median is one fetch every 9 seconds.
- the longest gap between two feed requests was 130 seconds.
- 90% of the inter-request gaps were 31 seconds or less, 75%
were 18 seconds or less, and 25% were 3 seconds or less.
- 6% of the feed fetch requests came at the same time (to the second) as another request; the peak number of fetches in one second is four, which happened several times.
- 7.5% came one second after the previous request (and this is the mode, the most common gap), 6% two seconds, 6% three seconds, and 5.5% four seconds. I'm going to stop there.
Of course averages are misleading; a thorough workup here would involve gnuplot and peering at charts (and also more than just 24 hours of data).
This is an interesting question partly because every so often people accidentally publish a blog entry and then want to retract it. Retraction is difficult in the face of syndication feeds; once an entry has started to appear in people's syndication feed fetches, you can no longer just remove it. My numbers suggest strongly that even moderately popular blogs have very little time before this starts happening.
Your standard input is a tty in a number of surprising cases
Every once in a while, someone writing a program decides that
checking to see whether standard input is a tty (via
is a great way of determining 'am I being run interactively or
not?'. This certainly sounds like a good way to do this check if
you aren't familiar with Unix and don't actually test any number
of situations, but in fact it is wrong almost all of the time.
For a start, this is wrong if your command is just being run in a shell script. Commands run from a shell script inherit the script's standard input; if you just ran the script itself from a shell command line, well, that's your tty. No Unix shell can behave differently because passing stdin to script commands is what lets shell scripts work in the middle of pipelines. But plain commands are the obvious case, so let's go for an odder one:
You guessed it:
/some/command inherits the shell's standard input
and thus may have its standard input connected to your tty. Its
standard output is not a tty, of course; it's being collected by
the shell instead.
Now let's talk about GNU Make. Plain commands in Makefiles are like
plain commands in shell scripts;
make gets your standard input
and passes it to commands being run. In my opinion this is far less
defensible than with shell scripts, although I'm sure someone has
a setup that uses
make and a Makefile in the middle of a pipeline
and counts on the commands run from the Makefile being able to read
standard input. Still, I suspect a certain number of people would
be surprised by that.
GNU Make has a feature where it can run a shell command as it parses the Makefile in order to do things like set up the value of Makefile variables. This looks like (in the simple version):
AVAR := $(shell /some/command ...)
This too can have
isatty(stdin) be true. Like the shell, GNU Make
passes its standard input down even to things being run via command
The short form version of this is almost anything that's run even
indirectly by a user from their shell prompt may have standard input
being a tty. Run from a shell script that's run from three levels
of Makefiles (and
makes) that are started from a shell script
that's spawned from a C program that does a
there's a pipeline somewhere in there, you probably still have
standard input connected to the user's tty.
It follows that checking
isatty(stdin) is a terrible way of seeing
whether or not your program is being run interactively, unless the
version of 'interactively' you care about is whether you're being
run from something that's totally detached from the user, like a
crontab or a ssh remote command execution (possibly an automated
one). Standard input not being a tty doesn't guarantee this, of
course, but if standard input is a tty you can be pretty sure that
you aren't being run from crontab et al.
(The corollary of this is that if you're writing shell scripts and
so on, you may sometimes want to deliberately disconnect standard
input from what it normally would be. This doesn't totally stop
people from talking to the user (they can always explicitly open
/dev/tty), but at least it makes it less likely to happen by
more or less accident.)
Eating memory in Python the easy way
As a system administrator, every so often I need to put a machine under the stress of having a lot of its memory used. Sometimes this is for testing how things respond to this before it happens during live usage; sometimes this is because putting a system under memory stress can cause it to do important things it doesn't otherwise do (such as reclaim extra memory). The traditional way to do this is with a memory eater program, something that just allocates a controlled amount of memory and then (usually) puts actual data in it.
(If you merely allocate memory but don't use it, many systems don't consider themselves to be under memory stress. Generally you have to make them use up actual RAM.)
In the old days, memory eater programs tended to be one-off things
written in C; you'd
malloc() some amount of memory then carefully
write data into it to force the system to give you RAM. People who
needed this regularly might keep around a somewhat more general
program for it. As it turns out, these days
I don't need to go to all of that work because interactive Python
will do just fine:
$ /usr/bin/amd64/python2.6 [...] >>> GB = 1024*1024*1024 >>> a = "a" * (10 * GB)
Voila, 10 GB eaten. Doing this interactively gives me great flexibility; for instance, I can easily eat memory in smaller chunks, say 1 GB at a time, so that I have more control over exactly when the system gets pushed hard (instead of perhaps throwing it well out of its comfort zone all at once).
There are some minor quibbles you can make here; for example I'm not using only exactly 10 GB of memory, since Python has some small overhead for objects and so on. And you probably want to specifically use bytestrings in Python 3, not the default Unicode strings.
In practice I don't care about the quibbles because this is close enough for me and it's really convenient (and flexible), far more so than writing a C program or re-finding the last one I wrote for this.
(If CPython allocates much additional internal memory to create this 10 GB string, it's not enough to be visible on the scale of GBytes of RAM usage. I tried a smaller test and didn't see more than perhaps a megabyte or two of surprising memory usage, but in general if you need really fine control over memory eating you're not going to want to use Python for it.)
PS: It makes me unreasonably happy to able to use Python interactively for things like this, especially when they're things I might have had to write a C program for in the past. It's just so neat to be able to just type this stuff out on the fly, whether it's eating memory or testing UDP behavior.
Mdb is so close to being a great tool for introspecting the kernel
The mdb debugger is the standard debugger on Solaris and Illumos
systems (including OmniOS). One very important aspect of
that it has a lot of support for kernel 'debugging', which for
ordinary people actually means 'getting detailed status information
out of the kernel'. For instance, if you want to know a great deal
about where your kernel memory is going you're going to want the
::kmastat' mdb command.
Mdb is capable of some very powerful tricks
because it lets you compose its commands together in 'pipelines'.
Mdb has a large selection of things to report information (like
::kmastat) and things to let you build your
own pipelines (eg walkers and
a huge amount of post-processing on your part. For instance, as far
as I know a pipeline can't have conditions or filtering so that you
further process only selected items that one stage of a pipeline
produces. In the case of listing file locks,
you're out of luck if you want to work on only selected files instead
of all of them.
I understand (I think) where this limitation comes from. Part of
it is probably simply the era
mdb was written in (which was not
yet a time when people shoved extension languages into everything
that moved), and part of it is likely that the code of
also much of the code of the embedded kernel debugger
from my perspective it's also a big missed opportunity. A
with scripting would let you filter pipelines and write your own
powerful information dumping and object traversal commands,
significantly extending the scope of what you could conveniently
extract from the kernel. And the presence of pipelines in
show that its creators were quite aware of the power of flexibly
processing and recombining things in a debugger.
(Custom scripting also has obvious uses for debugging user level programs, where a complex program may be full of its own idioms and data structures that cry out for the equivalent of kernel dcmds and walkers.)
PS: Technically you can extend
mdb by writing new mdb modules in
C, since they're just
.sos that are loaded dynamically; there's
even a more or less documented module API. In practice my reaction
is 'good luck with that'.