Why tiling window managers are not really for me (on the desktop)
For years now, the Unix geek thing to do has been to use a tiling window manager. There are a relative tone of them, and many of them promising the twin goals of simplicity and (full) keyboard control out of the box. However, I've never been particularly interested in any of them, despite my broad view that fvwm is a fairly complex window manager as these things go and definitely shows its age in some ways.
The short version of a large part of why I don't feel attracted to tiling window managers is that I generally like empty space on my desktop. Tiling window managers seem to generally be built around the idea that you want to fill everything up to make maximal use of your screen real estate. I don't feel this way; unless I'm going a lot, I actively want there to be empty space on my screen so that my overall environment feels uncluttered. If I only need one or two terminal windows active at the moment, I don't see any reason to have them take up the entire screen.
(Related to this is that I sometimes deliberately overlap windows in order to put certain windows in what I consider good or correct positions for what I'm doing without forcing others to shrink.)
I'm sure that at least some tiling window managers can be taught to leave empty space if you want it and not force the screen to be filled with windows all the time. And probably some can have overlapping, partially occluded windows as well. But it's not how people usually talk about using tiling window managers and so I've wound up feeling that it's not an entirely natural thing for them. I'd rather not try to force a new window manager to operate in a way that it's not really built for when I already have a perfectly good non-tiling window manager.
(There's also the issue of my use of spatial memory for finding windows, both active and especially inactive, which I also have the impression that tiling window managers are not so hot on.)
At the same time, I've seen tiling window layouts work very well
in some circumstances when the layout is sufficiently smart; Rob
is the poster child for this for me. There are certainly situations
where I would like to switch over to an Acme-style tiled approach
to window layout (probably usually when my screen gets busy and
cluttered with active windows). It's just that I don't want to live
in that world all of the time (and, to be honest, there are issues with
xterm that make it annoying to keep changing
its width all the time).
It's a pity, though. Some of those tiling window managers definitely do look cool and sound interesting.
PS: All of this is relatively specific to my desktop, where I have a physically large display and so it's frequently uncluttered, with room to spare.
(I'm not particularly attracted by 'control everything from the keyboard', either. I like using the mouse for things it's good at.)
Spam and virus filtering on email is a risk (although likely not a big one)
If you have a decent-sized email system, you're probably running incoming email through some sort of anti-virus and anti-spam system. It may be a commercial product such as the one we use, or it may be a free one such as SpamAssassin or ClamAV. There are ways around needing such a system while still allowing a reasonable amount of incoming email, but they let some spam through and they require aggressively blocking attachments in order to try to exclude viruses.
These systems, commercial or free, are a potential security risk. We know that desktop anti-virus scanners have vulnerabilities (both in the engines and in things like their update mechanisms), so it's only prudent to assume that server-based systems do as well, especially for anti-virus systems. Modern AV systems are trying to parse and understand complicated file formats, almost certainly using code written in C and not aggressively hardened; it would be a miracle if they didn't have exploitable vulnerabilities somewhere.
(At least one commercial system definitely had vulnerabilities, although they may or may not have been exploitable.)
At one level, this is really quite alarming; your email AV system is completely exposed to inbound email from the outside world, since automatically checking that email is its entire job. An attacker who knows and can exploit a vulnerability in it can send you a malicious message and your system will be owned without any action on your part. It's not too much different from your web server having a remotely exploitable vulnerability. Yes, it's likely that coming up with a reliable attack against your AV system will be harder, but it's very likely it can still be done.
So should you abandon use of an AV system, and in fact of all content-scanning systems that look at your inbound email? As usual, this is a balance of risks question. In particular I think it's a question of how easily AV systems can be exploited generically and have something useful done with them.
The reality of life is that if an attacker is targeting you specifically, they're probably going to get in somehow. It's worth making sure that your AV system is not exceptionally vulnerable, but at the same time it is probably not the sole weak point in your environment, and not having an AV system or other content filtering has its own set of risks. For most sites, you are probably better off overall having an email AV system even if it provides an additional attack point for someone who is targeting you specifically.
But specific attackers aren't the only attackers we have to worry about; there are also mass attackers, people who find some broadly spread vulnerability and attack everyone they can find with it in order to do various sorts of nastiness (sending out spam, holding your files to ransom, selling access to other people, whatever). If a mass attack is possible at all, it is really the biggest risk, simply because mass attackers spray their attack widely in order to reach as many targets as possible.
(As a corollary, there probably will never be a mass attack against your custom local filtering, although there may be a mass attack against some common sub-component you're using in it, such as a MIME parsing library or a compression library.)
I'm wary of saying that there can't be a successful mass attack against an email AV or anti-spam scanner, but I think that the odds are against it. These systems are deployed on varied systems, in very varied environments, often in varied versions of the software itself, and there are a fair number of different software packages that mail systems use. Barring a glaring, trivial vulnerability, a would be mass attacker probably can't develop a truly broad single exploit even for a broadly spread vulnerability; they might need a different one for different Linux releases, for example. Then they'd have to find enough mail systems on the Internet that were running the specific AV/anti-spam system on Debian X or CentOS Y in order to make a mass attack worth it. It just seems unlikely to me.
(Things like web servers are more exposed to mass attacks because they are easier to mass scan and assess.)
Thinking about how to add some SSDs on my home machine
It all started when I upgraded from Fedora 24 to Fedora 25 on my office workstation and then my home machine in close succession, and the work upgrade went much faster because my root filesystem was on SSDs. This finally pushed me over the edge to get a pair of SSDs for my home machine, as I've known I should do for a while. I now actually have the SSDs, but, well, I haven't put them into my home machine yet. You might wonder why, so let me put it this way: the next case I get will have at least six drive bays.
My current case has four drive bays (well, four conveniently usable 3.5" drive bays), and all four drive bays are used; two for the mirrored pair of system HDs, and two for the mirrored pair of data HDs. The SSDs will be replacing the system HDs (and pulling in things like my home directory filesystem from the data HDs), but I can't exactly unplug the HDs and put in the SSDs; I need to shift over, and to do that I need to temporarily have the SSDs in the system too. So I've been mulling over how best to do that, and in the mean time my SSDs have just been sitting there.
(If I had six drive bays it would be easy and I would have shoved in the SSDs almost immediately. And the delay is not just because I've been thinking; it's also because shuffling everything around is going to be kind of a hassle however I do it, and so I keep putting it off in favor of more interesting and pleasant things.)
I have a 3.5" to 2.5" dual SSD adaptor for the SSDs (I'm also using one at work), so a single open 3.5" drive bay will allow me to put both into the machine. A number of potential approaches have occurred to me:
- My case has some 5.25" drive bays, which I'm not using. Maybe I could
just temporarily rest the dual adaptor on the bottom of that area, run
cables to it, and have that work. (The deluxe version would be to put
the 3.5" to 2.5" adaptor in a 5.25" to 3.5" adaptor, but I don't have
one of the latter sitting around and that feels like a lot of work.)
- I could just temporarily run with the side of the case open and cables
running to the SSDs. Don't laugh, one co-worker has been running with
his machine opened up like this for years. It'd be awkward for me,
though, because of where everything is physically (my co-worker has his
open machine on his desk).
- I could deliberately break the mirror of my system disks, remove one, and put the two SSDs in the drive slot freed up by that. It's not very likely that the remaining system disk will fail while I'm shifting over, and if it does I have the other system disk to swap back in.
Breaking the system disk mirror and removing one of the disks strikes me as the least crazy plan. However, it means I get to find out if my Fedora system is set up so that it will actually boot when one of the system disks goes away, or if it will throw up its hands because the shape of the RAID array is not exactly what it wants (this has been known to happen under some circumstances, although that wasn't a disk going missing). Certainly I'd hope that my Fedora 25 system will boot without problems there, but between general issues and systemd I don't have complete confidence here, and I can imagine scenarios that end up with me having to boot a rescue environment and try to glue my system back together again by hand.
(My system disk mirror doesn't just have the root filesystem; it
/boot and swap, each as mirrored things. So systemd
needs to be willing to bring up several RAID arrays in degraded
mode in order to be able to get everything in
I expect that the easiest way to test this is to open the case up,
shut the system down, pull the power connector for one of my system
disks, and then try to boot the system. If it fails, I can shut
everything down, plug the power connector back in, and hopefully
everything will be back to being happy with the world. It would
probably be more proper to take the disk offline in
that may be less easily reversed if things then explode.
(My plan for the SSDs are about a 100 GB ext4 root filesystem (which
will also get
/boot), a bit of swap space, and then the rest of
the space in a ZFS pool. The pool will get my home directory and
various other things that fit where I care either about speed or
about having ZFS's checksums for the data.)
Exim, IPv6, and hosts that MX to localhost
For some time now, Exim on our external MX gateway has been logging messages like the following:
2017-01-17 14:14:55 H=... [...] sender verify defer for <firstname.lastname@example.org>: lowest numbered MX record points to local host
On the one hand, this is fair enough, because at the moment the MX
azusa.us is indeed:
azusa.us. 3600 IN MX 0 localhost.
On the other hand, Exim has a router option that is intended to
deal with this, called
ignore_target_hosts; it lets you list
any number of IP addresses which are supposed to be ignored if they
show up in the process of looking up things. This allows you to
ignore not just people who list MXs that resolve to 127.0.0.1 but
also people who, say, list RFC 1918 IP addresses in public DNS; your Exim can laugh at their attempts to
use these names as
MAIL FROMs on the public Internet.
We have had a
ignore_target_hosts setting for years:
ignore_target_hosts = 0.0.0.0 : 127.0.0.0/8 : 255.255.255.255 : 169.254.0.0/16
(We would like to ignore the remaining RFC 1918 address space, but we actually use it ourselves and disentangling the resulting mess has so far been too complicated.)
So our Exim configuration certainly looked like it should have
azusa.us MX entry instead of temporarily deferring
it as, basically, a DNS configuration error. It MX'd to localhost
and Exim even recognized that it did, since it was reporting just
that. After a bunch of flailing around, I worked out what was going
on: Exim was looking up the IPv6 address of
localhost. as well
as the IPv4 one, and the IPv6 localhost address was not ignored.
So when Exim saw this MX entry it did both A and AAAA lookups on
localhost., discarded the 127.0.0.1 A record because it matched
an entry in
ignore_target_hosts, accepted the ::1 AAAA record
because it didn't, and then reported the 'lowest numbered MX record
points to local host' error. The fix for this is straightforward;
we added ::1 to
(I suspect that this started happening when we replaced Bind on
our OpenBSD internal resolvers with Unbound,
as Unbound internally provides A and AAAA records for
by default. Before then, all queries for
localhost. might have
failed entirely. I have no opinion on whether providing a
name in DNS is a good idea or not, because I haven't looked into
the reasons for why this was done.)
This is an especially interesting issue for me partly because it's yet another illustration of the ripple of changes that adding IPv6 causes. We don't even use IPv6 (yet), but here we are being affected by it and having to include it in our configurations none the less.
Making my machine stay responsive when writing to USB drives
Yesterday I talked about how writing things to USB drives made my machine not very responsive, and in a comment Nolan pointed me to LWN's The pernicious USB-stick stall problem. According to LWN's article, the core problem is an excess accumulation of dirty write buffers, and they give some VM system sysctls that you can use to control this.
I was dubious that this was my problem, for two reasons. First, I
have a 16 GB machine and I barely use all that memory, so I thought
that allowing a process to grab a bit over 3 GB of them for dirty
buffers wouldn't make much of a difference. Second, I had actually
sync frequently (in a shell loop) during the entire
process, because I have sometimes had it make a difference in these
situations; I figured frequent
syncs should limit the amount of
dirty buffers accumulating in general. But I figured it couldn't
hurt to try, so I used the
settings to limit this to 256 MB and 512 MB respectively and tested
It turns out that I was wrong. With these sysctls turned down, my machine stayed quite responsive for once, despite me doing various things to the USB flash drive (including things that had had a terrible effect just yesterday). I don't entirely understand why, though, which makes me feel as if I'm doing fragile magic instead of system tuning. I also don't know if setting these down is going to have a performance impact on other things that I do with my machine; intuitively I'd generally expect not, but clearly my intuition is suspect here.
(Per this Bob Plankers article,
you can monitor the live state of your system with
'dirty|writeback' /proc/vmstat. This will tell you the number of
currently dirty pages and the thresholds (in pages, not bytes). I
nr_writeback is the number of pages actively being
flushed out at the moment, so you can also monitor that.)
PS: In a system with drives (and filesystems) of vastly different speeds, a global dirty limit or ratio is a crude tool. But it's the best we seem to have on Linux today, as far as I know.
(In theory, modern cgroups support
the ability to have per-cgroup
dirty_bytes settings, which would
let you add extra limits to processes that you knew were going to
do IO to slow devices. In practice this is only supported on a few
filesystems and isn't exposed (as far as I know) through systemd's
Linux is terrible at handling IO to USB drives on my machine
Normally I don't do much with USB disks on my machine,
either flash drives or regular hard drives. When I do, it's mostly
to do bulk read or write things such as blanking a disk or writing
an installer image to a flash drive, and I've learned the hard
way to force direct IO through
dd when I'm doing this kind of
Today, for reasons beyond the scope of this entry, I was copying a
directory of files to a USB flash drive, using USB 3.0 for once.
This simple operation absolutely murdered the responsiveness of my machine. Even things as simple as moving windows around could stutter (and fvwm doesn't exactly do elaborate things for that), never mind doing anything like navigating somewhere in a browser or scrolling the window of my Twitter client. It wasn't CPU load, because ssh sessions to remote machines were perfectly responsive; instead it seemed that anything that might vaguely come near doing filesystem IO was extensively delayed.
ionice was ineffective. I'm not really surprised,
since the last time I looked it didn't do anything for software
While hitting my local filesystems with a heavy IO load will slow
other things down, it doesn't do it to this extent, and I wasn't
doing anything particularly IO-heavy in the first place (especially
since the USB flash drive was not going particularly fast). I also
tried out copying a few (big) files by hand with
dd so I could
oflag=direct, and that was significantly better, so I'm
pretty confident that it was the USB IO specifically that was the
I don't know what the Linux kernel is doing here to gum up its works so much, and I don't know if it's general or specific to my hardware, but it's been like this for years and I wish it would get better. Right now I'm not feeling very optimistic about the prospects of a USB 3.0 external drive helping solve things like my home backup headaches.
(I took a look with
vmstat to see if I could spot something like
a high amount of CPU time in interrupt handlers, but as far as I
could see the kernel was just sitting around waiting for IO all the
PS: We have more modern Linux machines with USB 3.0 ports at work, so I suppose I should do some tests with one just to see. If this Linux failure is specific to my hardware, it adds some more momentum for a hardware upgrade (cf).
(This elaborates on some tweets of mine.)
Link: Let's Stop Ascribing Meaning to Code Points
Manish Goregaokar's Let's Stop Ascribing Meaning to Code Points starts out with this:
I've seen misconceptions about Unicode crop up regularly in posts discussing it. One very common misconception I've seen is that code points have cross-language intrinsic meaning.
He goes on to explain the ways that this is dangerous and how tangled this area of Unicode is. I knew little bits of this already, but apparently combining characters are only the tip of the iceberg.
Some notes on 4K monitors and connecting to them
For reasons beyond the scope of this entry, I'm probably going to build a new home machine this year, finally replacing my current vintage 2011 machine. As part of this (and part of motivating me into doing it), I'm going to persuade myself to finally get a high-resolution display, probably a 27" 4K monitor such as the Dell P2715Q. Now, I would like this hypothetical new machine to drive this hypothetical 4K+ monitor using (Intel) motherboard graphics, which means that I need a motherboard that supports 4K at 60 Hz through, well, whatever connector I should have. Which has sent me off on a quest to understand just how modern monitors connect to modern computers.
(It would be simple if all motherboard supported 4K at 60 Hz on all the various options, but they don't. Just among the modest subset I've already looked at, some motherboards do DisplayPort, some do HDMI, and some have both but not at 4K @ 60 Hz for both.)
As far as I can tell so far, the answer is 'DisplayPort 1.2' or better. If I wanted to go all the way to a 5K display at 60 Hz, I would need DisplayPort 1.3, but 5K displays appear to still be too expensive. Every 4K monitor I've looked at has DisplayPort, generally 1.2 or 1.2a. HDMI 2.0 will also do 4K at 60 Hz and some monitors have that as well.
(That 4K monitors mostly don't go past DisplayPort 1.2 is apparently not a great thing. DisplayPort allows you to daisy-chain displays but you have to stay within the total bandwidth limit, so a 4K monitor that wants to let you daisy-chain to a second 4K monitor needs at least one DP 1.3+ port. Of course you'd also need DisplayPort 1.3+ on your motherboard or graphics card.)
Adding to the momentum of DisplayPort as the right choice is that there are also converters from DisplayPort 1.2 to HDMI 2.0 (and apparently not really any that go the other way). So a motherboard with DisplayPort 1.2 and support for 4K at 60 Hz over it can be used to drive a HDMI 2.0-only monitor, if such a thing even exists (there are probably HDMI 2.0 only TVs, but I'm not interested in them).
I assume that having HDMI 2.0 on motherboards helps if you want to drive a TV, and that having both DisplayPort 1.2 and HDMI 2.0 (both with 4K at 60 Hz support) might let you drive two 4K displays if one of them has HDMI 2.0. The latter feature is not interesting to me at the moment, as one 27" display is going to take up enough desk space at home all on its own.
(As usual, searching for and comparing PC motherboards seems to be a pain in the rear. You'd think vendors would let you easily search on 'I want the following features ...', but apparently not.)
My picks for mind-blowing Git features
It started on Twitter:
@tobyhede: What git feature would you show someone who has used source control (but not git) that would blow their mind?
@thatcks: Sysadmins: git bisect. People w/ local changes: rebase. Devs: partial/selective commits & commit reordering.
Given that at different times I fall into all three of these groups, I kind of cheated in my answer. But I'll stand by it anyways, and since Twitter forces a distinct terseness on things, I'm going to expand on why these things are mind-blowing.
If you use some open source package and you can compile it,
bisect (plus some time and work) generally gives you the superpower
of being able to tell the developers 'this specific change broke a
thing that matters to me', instead of having to tell them just 'it
broke somewhere between vN and vN+1'. Being able to be this specific
to developers drastically increases the chances that your bug will
actually get fixed. You don't have to know how to program to narrow
down your bug report, just be able to use
git bisect, compile the
package, and run it to test it.
(If what broke is 'it doesn't compile any more', you can even automate this.)
If you carry local modifications in your copy of an upstream project,
changes that will never be integrated and that you have no intention
of feeding upstream,
git rebase is so much your friend that I
wrote an entire entry about how and
why. In the pre-git world, at best you wound up with a messy tangle
of branches and merges that left the history of your local repository
increasingly different from the upstream one; at worst your local
changes weren't even committed to version control, just thrown on
top of the upstream as patches and changes that tools like
attempted to automatically merge into new upstream commits when you
If you're developing changes, well, in theory you're disciplined and you use feature branches and do one thing at a time and your diffs are always pure. In practice I think that a lot of the time this is not true, and at that point git's ability to do selective commits, reorder commits, and so on will come along and save your bacon; you can use them to sort out the mess and create a series of clean commits. In the pre-git, pre-selective-commit era things were at least a bunch more work and perhaps more messy. Certainly for casual development people probably just made big commits with random additional changes in them; I know that I certainly did (and I kept doing it even in git until recently because I didn't have the right tools to make this easy.
(Of course this wasn't necessarily important for keeping track of your local changes, because before git you probably weren't committing them in the first place.)
PS: There is one git feature that blows my mind on a technical level because it is just so neat and so clever. But that's going to be another entry, and also it's technically not an official git feature.
(My line between 'official git feature' and 'neat addon hack' is whether the hack in question ships with git releases as an official command.)
The ZFS pool history log that's used by '
zpool history' has a size limit
I have an awkward confession. Until Aneurin Price mentioned it in
his comment on my entry on '
zpool history -i',
I had no idea that the internal, per-pool history log that
history uses has a size limit. I thought that perhaps the size and
volume of events was small enough that ZFS just kept everything,
which is silly in retrospect. This unfortunately means that the
long-term 'strategic' use of
zpool history that I talked about
in my first entry has potentially significant
limits, because you can only go back so far in history. How far
depends on a number of factors, including how many snapshots and
so on you take.
(If you're just inspecting the output of '
zpool history', it's
easy to overlook that it's gotten truncated, because it always
starts with the pool's creation. This is because the ZFS code that
maintains the log goes out of its way to make sure that the initial
pool creation record is kept forever.)
The ZFS code that creates and maintains the log is in spa_history.c.
As far as the log's size goes, let me quote the comment in
/* * Figure out maximum size of history log. We set it at * 0.1% of pool size, with a max of 1G and min of 128KB. */
Now, there is a complication, which is that the pool history log is only sized and set up once, at initial pool creation. So that size is not 0.1% of the current pool size, it is 0.1% of the initial pool size, whatever that was. If your pool has been expanded since its creation and started out smaller than 1000 GB, its history log is smaller (possibly much smaller) than it would be if you recreated the pool at 1000 GB or more now. Unfortunately, based on the code, I don't think ZFS can easily resize the history log after creation (and it certainly doesn't attempt to now).
The ZFS code does maintain some information about how many records
have been lost and how many total bytes have been written to the
log, but these don't seem to be exposed in any way to user-level
code; they're simply there in the on-disk and in-memory data
structures. You'd have to dig them out of the depths of the kernel
with DTrace or the like, or you can use
zdb to read them off disk.
(It turns out that our most actively snapshotted pool, which probably has the most records in its log, only has an 11% full history log at the moment.)
zdb to see history log information
This is brief notes, in the style of using
zdb to see the ZFS
delete queue. First we need to find out the object
ID of the SPA history information, which is always going to be in
the pool's root dataset (as far as I know):
# zdb -dddd rpool 1 Dataset mos [META], [...] Object lvl iblk dblk dsize lsize %full type 1 1 16K 16K 24.0K 32K 100.00 object directory [...] history = 32 [...]
The history log is stored in a ZFS object; here that is object number 32. Since it was object 32 in three pools that I checked, it may almost always be that.
# zdb -dddd rpool 32 Dataset [...] Object lvl iblk dblk dsize lsize %full type 32 1 16K 128K 36.0K 128K 100.00 SPA history 40 bonus SPA history offsets dnode flags: USED_BYTES dnode maxblkid: 0 pool_create_len = 536 phys_max_off = 79993765 bof = 536 eof = 77080 records_lost = 0
eof values are logical byte positions in the ring
buffer, and so at least
eof will be larger than
if you've started losing records. For more details, see the comments