Wandering Thoughts


Why tiling window managers are not really for me (on the desktop)

For years now, the Unix geek thing to do has been to use a tiling window manager. There are a relative tone of them, and many of them promising the twin goals of simplicity and (full) keyboard control out of the box. However, I've never been particularly interested in any of them, despite my broad view that fvwm is a fairly complex window manager as these things go and definitely shows its age in some ways.

The short version of a large part of why I don't feel attracted to tiling window managers is that I generally like empty space on my desktop. Tiling window managers seem to generally be built around the idea that you want to fill everything up to make maximal use of your screen real estate. I don't feel this way; unless I'm going a lot, I actively want there to be empty space on my screen so that my overall environment feels uncluttered. If I only need one or two terminal windows active at the moment, I don't see any reason to have them take up the entire screen.

(Related to this is that I sometimes deliberately overlap windows in order to put certain windows in what I consider good or correct positions for what I'm doing without forcing others to shrink.)

I'm sure that at least some tiling window managers can be taught to leave empty space if you want it and not force the screen to be filled with windows all the time. And probably some can have overlapping, partially occluded windows as well. But it's not how people usually talk about using tiling window managers and so I've wound up feeling that it's not an entirely natural thing for them. I'd rather not try to force a new window manager to operate in a way that it's not really built for when I already have a perfectly good non-tiling window manager.

(There's also the issue of my use of spatial memory for finding windows, both active and especially inactive, which I also have the impression that tiling window managers are not so hot on.)

At the same time, I've seen tiling window layouts work very well in some circumstances when the layout is sufficiently smart; Rob Pike's Acme is the poster child for this for me. There are certainly situations where I would like to switch over to an Acme-style tiled approach to window layout (probably usually when my screen gets busy and cluttered with active windows). It's just that I don't want to live in that world all of the time (and, to be honest, there are issues with xterm that make it annoying to keep changing its width all the time).

It's a pity, though. Some of those tiling window managers definitely do look cool and sound interesting.

PS: All of this is relatively specific to my desktop, where I have a physically large display and so it's frequently uncluttered, with room to spare.

(I'm not particularly attracted by 'control everything from the keyboard', either. I like using the mouse for things it's good at.)

unix/TilingWMNotReallyForMe written at 20:38:42; Add Comment

Spam and virus filtering on email is a risk (although likely not a big one)

If you have a decent-sized email system, you're probably running incoming email through some sort of anti-virus and anti-spam system. It may be a commercial product such as the one we use, or it may be a free one such as SpamAssassin or ClamAV. There are ways around needing such a system while still allowing a reasonable amount of incoming email, but they let some spam through and they require aggressively blocking attachments in order to try to exclude viruses.

These systems, commercial or free, are a potential security risk. We know that desktop anti-virus scanners have vulnerabilities (both in the engines and in things like their update mechanisms), so it's only prudent to assume that server-based systems do as well, especially for anti-virus systems. Modern AV systems are trying to parse and understand complicated file formats, almost certainly using code written in C and not aggressively hardened; it would be a miracle if they didn't have exploitable vulnerabilities somewhere.

(At least one commercial system definitely had vulnerabilities, although they may or may not have been exploitable.)

At one level, this is really quite alarming; your email AV system is completely exposed to inbound email from the outside world, since automatically checking that email is its entire job. An attacker who knows and can exploit a vulnerability in it can send you a malicious message and your system will be owned without any action on your part. It's not too much different from your web server having a remotely exploitable vulnerability. Yes, it's likely that coming up with a reliable attack against your AV system will be harder, but it's very likely it can still be done.

So should you abandon use of an AV system, and in fact of all content-scanning systems that look at your inbound email? As usual, this is a balance of risks question. In particular I think it's a question of how easily AV systems can be exploited generically and have something useful done with them.

The reality of life is that if an attacker is targeting you specifically, they're probably going to get in somehow. It's worth making sure that your AV system is not exceptionally vulnerable, but at the same time it is probably not the sole weak point in your environment, and not having an AV system or other content filtering has its own set of risks. For most sites, you are probably better off overall having an email AV system even if it provides an additional attack point for someone who is targeting you specifically.

But specific attackers aren't the only attackers we have to worry about; there are also mass attackers, people who find some broadly spread vulnerability and attack everyone they can find with it in order to do various sorts of nastiness (sending out spam, holding your files to ransom, selling access to other people, whatever). If a mass attack is possible at all, it is really the biggest risk, simply because mass attackers spray their attack widely in order to reach as many targets as possible.

(As a corollary, there probably will never be a mass attack against your custom local filtering, although there may be a mass attack against some common sub-component you're using in it, such as a MIME parsing library or a compression library.)

I'm wary of saying that there can't be a successful mass attack against an email AV or anti-spam scanner, but I think that the odds are against it. These systems are deployed on varied systems, in very varied environments, often in varied versions of the software itself, and there are a fair number of different software packages that mail systems use. Barring a glaring, trivial vulnerability, a would be mass attacker probably can't develop a truly broad single exploit even for a broadly spread vulnerability; they might need a different one for different Linux releases, for example. Then they'd have to find enough mail systems on the Internet that were running the specific AV/anti-spam system on Debian X or CentOS Y in order to make a mass attack worth it. It just seems unlikely to me.

(Things like web servers are more exposed to mass attacks because they are easier to mass scan and assess.)

spam/SpamAndVirusFilteringRisk written at 01:25:39; Add Comment


Thinking about how to add some SSDs on my home machine

It all started when I upgraded from Fedora 24 to Fedora 25 on my office workstation and then my home machine in close succession, and the work upgrade went much faster because my root filesystem was on SSDs. This finally pushed me over the edge to get a pair of SSDs for my home machine, as I've known I should do for a while. I now actually have the SSDs, but, well, I haven't put them into my home machine yet. You might wonder why, so let me put it this way: the next case I get will have at least six drive bays.

My current case has four drive bays (well, four conveniently usable 3.5" drive bays), and all four drive bays are used; two for the mirrored pair of system HDs, and two for the mirrored pair of data HDs. The SSDs will be replacing the system HDs (and pulling in things like my home directory filesystem from the data HDs), but I can't exactly unplug the HDs and put in the SSDs; I need to shift over, and to do that I need to temporarily have the SSDs in the system too. So I've been mulling over how best to do that, and in the mean time my SSDs have just been sitting there.

(If I had six drive bays it would be easy and I would have shoved in the SSDs almost immediately. And the delay is not just because I've been thinking; it's also because shuffling everything around is going to be kind of a hassle however I do it, and so I keep putting it off in favor of more interesting and pleasant things.)

I have a 3.5" to 2.5" dual SSD adaptor for the SSDs (I'm also using one at work), so a single open 3.5" drive bay will allow me to put both into the machine. A number of potential approaches have occurred to me:

  • My case has some 5.25" drive bays, which I'm not using. Maybe I could just temporarily rest the dual adaptor on the bottom of that area, run cables to it, and have that work. (The deluxe version would be to put the 3.5" to 2.5" adaptor in a 5.25" to 3.5" adaptor, but I don't have one of the latter sitting around and that feels like a lot of work.)

  • I could just temporarily run with the side of the case open and cables running to the SSDs. Don't laugh, one co-worker has been running with his machine opened up like this for years. It'd be awkward for me, though, because of where everything is physically (my co-worker has his open machine on his desk).

  • I could deliberately break the mirror of my system disks, remove one, and put the two SSDs in the drive slot freed up by that. It's not very likely that the remaining system disk will fail while I'm shifting over, and if it does I have the other system disk to swap back in.

Breaking the system disk mirror and removing one of the disks strikes me as the least crazy plan. However, it means I get to find out if my Fedora system is set up so that it will actually boot when one of the system disks goes away, or if it will throw up its hands because the shape of the RAID array is not exactly what it wants (this has been known to happen under some circumstances, although that wasn't a disk going missing). Certainly I'd hope that my Fedora 25 system will boot without problems there, but between general issues and systemd I don't have complete confidence here, and I can imagine scenarios that end up with me having to boot a rescue environment and try to glue my system back together again by hand.

(My system disk mirror doesn't just have the root filesystem; it also has /boot and swap, each as mirrored things. So systemd needs to be willing to bring up several RAID arrays in degraded mode in order to be able to get everything in /etc/fstab up.)

I expect that the easiest way to test this is to open the case up, shut the system down, pull the power connector for one of my system disks, and then try to boot the system. If it fails, I can shut everything down, plug the power connector back in, and hopefully everything will be back to being happy with the world. It would probably be more proper to take the disk offline in mdadm, but that may be less easily reversed if things then explode.

(My plan for the SSDs are about a 100 GB ext4 root filesystem (which will also get /boot), a bit of swap space, and then the rest of the space in a ZFS pool. The pool will get my home directory and various other things that fit where I care either about speed or about having ZFS's checksums for the data.)

linux/PlanningHomeSSDShuffle written at 00:18:35; Add Comment


Exim, IPv6, and hosts that MX to localhost

For some time now, Exim on our external MX gateway has been logging messages like the following:

2017-01-17 14:14:55 H=... [...] sender verify defer for <qifgukejapwgau@azusa.us>: lowest numbered MX record points to local host

On the one hand, this is fair enough, because at the moment the MX entry for azusa.us is indeed:

azusa.us.    3600  IN  MX    0 localhost.

On the other hand, Exim has a router option that is intended to deal with this, called ignore_target_hosts; it lets you list any number of IP addresses which are supposed to be ignored if they show up in the process of looking up things. This allows you to ignore not just people who list MXs that resolve to but also people who, say, list RFC 1918 IP addresses in public DNS; your Exim can laugh at their attempts to use these names as MAIL FROMs on the public Internet.

We have had a ignore_target_hosts setting for years:

ignore_target_hosts = : : :

(We would like to ignore the remaining RFC 1918 address space, but we actually use it ourselves and disentangling the resulting mess has so far been too complicated.)

So our Exim configuration certainly looked like it should have rejected that azusa.us MX entry instead of temporarily deferring it as, basically, a DNS configuration error. It MX'd to localhost and Exim even recognized that it did, since it was reporting just that. After a bunch of flailing around, I worked out what was going on: Exim was looking up the IPv6 address of localhost. as well as the IPv4 one, and the IPv6 localhost address was not ignored.

So when Exim saw this MX entry it did both A and AAAA lookups on localhost., discarded the A record because it matched an entry in ignore_target_hosts, accepted the ::1 AAAA record because it didn't, and then reported the 'lowest numbered MX record points to local host' error. The fix for this is straightforward; we added ::1 to ignore_target_hosts.

(I suspect that this started happening when we replaced Bind on our OpenBSD internal resolvers with Unbound, as Unbound internally provides A and AAAA records for localhost. by default. Before then, all queries for localhost. might have failed entirely. I have no opinion on whether providing a localhost. name in DNS is a good idea or not, because I haven't looked into the reasons for why this was done.)

This is an especially interesting issue for me partly because it's yet another illustration of the ripple of changes that adding IPv6 causes. We don't even use IPv6 (yet), but here we are being affected by it and having to include it in our configurations none the less.

sysadmin/EximIPv6Localhost written at 00:02:50; Add Comment


Making my machine stay responsive when writing to USB drives

Yesterday I talked about how writing things to USB drives made my machine not very responsive, and in a comment Nolan pointed me to LWN's The pernicious USB-stick stall problem. According to LWN's article, the core problem is an excess accumulation of dirty write buffers, and they give some VM system sysctls that you can use to control this.

I was dubious that this was my problem, for two reasons. First, I have a 16 GB machine and I barely use all that memory, so I thought that allowing a process to grab a bit over 3 GB of them for dirty buffers wouldn't make much of a difference. Second, I had actually been running sync frequently (in a shell loop) during the entire process, because I have sometimes had it make a difference in these situations; I figured frequent syncs should limit the amount of dirty buffers accumulating in general. But I figured it couldn't hurt to try, so I used the dirty_background_bytes and dirty_bytes settings to limit this to 256 MB and 512 MB respectively and tested things again.

It turns out that I was wrong. With these sysctls turned down, my machine stayed quite responsive for once, despite me doing various things to the USB flash drive (including things that had had a terrible effect just yesterday). I don't entirely understand why, though, which makes me feel as if I'm doing fragile magic instead of system tuning. I also don't know if setting these down is going to have a performance impact on other things that I do with my machine; intuitively I'd generally expect not, but clearly my intuition is suspect here.

(Per this Bob Plankers article, you can monitor the live state of your system with egrep 'dirty|writeback' /proc/vmstat. This will tell you the number of currently dirty pages and the thresholds (in pages, not bytes). I believe that nr_writeback is the number of pages actively being flushed out at the moment, so you can also monitor that.)

PS: In a system with drives (and filesystems) of vastly different speeds, a global dirty limit or ratio is a crude tool. But it's the best we seem to have on Linux today, as far as I know.

(In theory, modern cgroups support the ability to have per-cgroup dirty_bytes settings, which would let you add extra limits to processes that you knew were going to do IO to slow devices. In practice this is only supported on a few filesystems and isn't exposed (as far as I know) through systemd's cgroups mechanisms.)

linux/FixingUSBDriveResponsiveness written at 00:36:09; Add Comment


Linux is terrible at handling IO to USB drives on my machine

Normally I don't do much with USB disks on my machine, either flash drives or regular hard drives. When I do, it's mostly to do bulk read or write things such as blanking a disk or writing an installer image to a flash drive, and I've learned the hard way to force direct IO through dd when I'm doing this kind of thing. Today, for reasons beyond the scope of this entry, I was copying a directory of files to a USB flash drive, using USB 3.0 for once.

This simple operation absolutely murdered the responsiveness of my machine. Even things as simple as moving windows around could stutter (and fvwm doesn't exactly do elaborate things for that), never mind doing anything like navigating somewhere in a browser or scrolling the window of my Twitter client. It wasn't CPU load, because ssh sessions to remote machines were perfectly responsive; instead it seemed that anything that might vaguely come near doing filesystem IO was extensively delayed.

(As usual, ionice was ineffective. I'm not really surprised, since the last time I looked it didn't do anything for software RAID arrays.)

While hitting my local filesystems with a heavy IO load will slow other things down, it doesn't do it to this extent, and I wasn't doing anything particularly IO-heavy in the first place (especially since the USB flash drive was not going particularly fast). I also tried out copying a few (big) files by hand with dd so I could force oflag=direct, and that was significantly better, so I'm pretty confident that it was the USB IO specifically that was the problem.

I don't know what the Linux kernel is doing here to gum up its works so much, and I don't know if it's general or specific to my hardware, but it's been like this for years and I wish it would get better. Right now I'm not feeling very optimistic about the prospects of a USB 3.0 external drive helping solve things like my home backup headaches.

(I took a look with vmstat to see if I could spot something like a high amount of CPU time in interrupt handlers, but as far as I could see the kernel was just sitting around waiting for IO all the time.)

PS: We have more modern Linux machines with USB 3.0 ports at work, so I suppose I should do some tests with one just to see. If this Linux failure is specific to my hardware, it adds some more momentum for a hardware upgrade (cf).

(This elaborates on some tweets of mine.)

linux/USBDrivesKillMyPerformance written at 01:32:16; Add Comment


Link: Let's Stop Ascribing Meaning to Code Points

Manish Goregaokar's Let's Stop Ascribing Meaning to Code Points starts out with this:

I've seen misconceptions about Unicode crop up regularly in posts discussing it. One very common misconception I've seen is that code points have cross-language intrinsic meaning.

He goes on to explain the ways that this is dangerous and how tangled this area of Unicode is. I knew little bits of this already, but apparently combining characters are only the tip of the iceberg.

(via, and see also.)

links/UnicodeCodePointsNoMeaning written at 16:54:18; Add Comment

Some notes on 4K monitors and connecting to them

For reasons beyond the scope of this entry, I'm probably going to build a new home machine this year, finally replacing my current vintage 2011 machine. As part of this (and part of motivating me into doing it), I'm going to persuade myself to finally get a high-resolution display, probably a 27" 4K monitor such as the Dell P2715Q. Now, I would like this hypothetical new machine to drive this hypothetical 4K+ monitor using (Intel) motherboard graphics, which means that I need a motherboard that supports 4K at 60 Hz through, well, whatever connector I should have. Which has sent me off on a quest to understand just how modern monitors connect to modern computers.

(It would be simple if all motherboard supported 4K at 60 Hz on all the various options, but they don't. Just among the modest subset I've already looked at, some motherboards do DisplayPort, some do HDMI, and some have both but not at 4K @ 60 Hz for both.)

As far as I can tell so far, the answer is 'DisplayPort 1.2' or better. If I wanted to go all the way to a 5K display at 60 Hz, I would need DisplayPort 1.3, but 5K displays appear to still be too expensive. Every 4K monitor I've looked at has DisplayPort, generally 1.2 or 1.2a. HDMI 2.0 will also do 4K at 60 Hz and some monitors have that as well.

(That 4K monitors mostly don't go past DisplayPort 1.2 is apparently not a great thing. DisplayPort allows you to daisy-chain displays but you have to stay within the total bandwidth limit, so a 4K monitor that wants to let you daisy-chain to a second 4K monitor needs at least one DP 1.3+ port. Of course you'd also need DisplayPort 1.3+ on your motherboard or graphics card.)

Adding to the momentum of DisplayPort as the right choice is that there are also converters from DisplayPort 1.2 to HDMI 2.0 (and apparently not really any that go the other way). So a motherboard with DisplayPort 1.2 and support for 4K at 60 Hz over it can be used to drive a HDMI 2.0-only monitor, if such a thing even exists (there are probably HDMI 2.0 only TVs, but I'm not interested in them).

I assume that having HDMI 2.0 on motherboards helps if you want to drive a TV, and that having both DisplayPort 1.2 and HDMI 2.0 (both with 4K at 60 Hz support) might let you drive two 4K displays if one of them has HDMI 2.0. The latter feature is not interesting to me at the moment, as one 27" display is going to take up enough desk space at home all on its own.

(As usual, searching for and comparing PC motherboards seems to be a pain in the rear. You'd think vendors would let you easily search on 'I want the following features ...', but apparently not.)

tech/Driving4KMonitorsNotes written at 03:04:18; Add Comment


My picks for mind-blowing Git features

It started on Twitter:

@tobyhede: What git feature would you show someone who has used source control (but not git) that would blow their mind?

@thatcks: Sysadmins: git bisect. People w/ local changes: rebase. Devs: partial/selective commits & commit reordering.

Given that at different times I fall into all three of these groups, I kind of cheated in my answer. But I'll stand by it anyways, and since Twitter forces a distinct terseness on things, I'm going to expand on why these things are mind-blowing.

If you use some open source package and you can compile it, git bisect (plus some time and work) generally gives you the superpower of being able to tell the developers 'this specific change broke a thing that matters to me', instead of having to tell them just 'it broke somewhere between vN and vN+1'. Being able to be this specific to developers drastically increases the chances that your bug will actually get fixed. You don't have to know how to program to narrow down your bug report, just be able to use git bisect, compile the package, and run it to test it.

(If what broke is 'it doesn't compile any more', you can even automate this.)

If you carry local modifications in your copy of an upstream project, changes that will never be integrated and that you have no intention of feeding upstream, git rebase is so much your friend that I wrote an entire entry about how and why. In the pre-git world, at best you wound up with a messy tangle of branches and merges that left the history of your local repository increasingly different from the upstream one; at worst your local changes weren't even committed to version control, just thrown on top of the upstream as patches and changes that tools like svn attempted to automatically merge into new upstream commits when you did svn up.

If you're developing changes, well, in theory you're disciplined and you use feature branches and do one thing at a time and your diffs are always pure. In practice I think that a lot of the time this is not true, and at that point git's ability to do selective commits, reorder commits, and so on will come along and save your bacon; you can use them to sort out the mess and create a series of clean commits. In the pre-git, pre-selective-commit era things were at least a bunch more work and perhaps more messy. Certainly for casual development people probably just made big commits with random additional changes in them; I know that I certainly did (and I kept doing it even in git until recently because I didn't have the right tools to make this easy.

(Of course this wasn't necessarily important for keeping track of your local changes, because before git you probably weren't committing them in the first place.)

PS: There is one git feature that blows my mind on a technical level because it is just so neat and so clever. But that's going to be another entry, and also it's technically not an official git feature.

(My line between 'official git feature' and 'neat addon hack' is whether the hack in question ships with git releases as an official command.)

programming/GitMindblowingFeatures written at 01:06:55; Add Comment


The ZFS pool history log that's used by 'zpool history' has a size limit

I have an awkward confession. Until Aneurin Price mentioned it in his comment on my entry on 'zpool history -i', I had no idea that the internal, per-pool history log that zpool history uses has a size limit. I thought that perhaps the size and volume of events was small enough that ZFS just kept everything, which is silly in retrospect. This unfortunately means that the long-term 'strategic' use of zpool history that I talked about in my first entry has potentially significant limits, because you can only go back so far in history. How far depends on a number of factors, including how many snapshots and so on you take.

(If you're just inspecting the output of 'zpool history', it's easy to overlook that it's gotten truncated, because it always starts with the pool's creation. This is because the ZFS code that maintains the log goes out of its way to make sure that the initial pool creation record is kept forever.)

The ZFS code that creates and maintains the log is in spa_history.c. As far as the log's size goes, let me quote the comment in spa_history_create_obj:

 * Figure out maximum size of history log.  We set it at
 * 0.1% of pool size, with a max of 1G and min of 128KB.

Now, there is a complication, which is that the pool history log is only sized and set up once, at initial pool creation. So that size is not 0.1% of the current pool size, it is 0.1% of the initial pool size, whatever that was. If your pool has been expanded since its creation and started out smaller than 1000 GB, its history log is smaller (possibly much smaller) than it would be if you recreated the pool at 1000 GB or more now. Unfortunately, based on the code, I don't think ZFS can easily resize the history log after creation (and it certainly doesn't attempt to now).

The ZFS code does maintain some information about how many records have been lost and how many total bytes have been written to the log, but these don't seem to be exposed in any way to user-level code; they're simply there in the on-disk and in-memory data structures. You'd have to dig them out of the depths of the kernel with DTrace or the like, or you can use zdb to read them off disk.

(It turns out that our most actively snapshotted pool, which probably has the most records in its log, only has an 11% full history log at the moment.)

Sidebar: Using zdb to see history log information

This is brief notes, in the style of using zdb to see the ZFS delete queue. First we need to find out the object ID of the SPA history information, which is always going to be in the pool's root dataset (as far as I know):

# zdb -dddd rpool 1
Dataset mos [META], [...]

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         1    1    16K    16K  24.0K    32K  100.00  object directory
               history = 32 

The history log is stored in a ZFS object; here that is object number 32. Since it was object 32 in three pools that I checked, it may almost always be that.

# zdb -dddd rpool 32
Dataset [...]
    Object  lvl   iblk   dblk  dsize  lsize   %full  type
        32    1    16K   128K  36.0K   128K  100.00  SPA history
                                         40   bonus  SPA history offsets
        dnode flags: USED_BYTES 
        dnode maxblkid: 0
                pool_create_len = 536
                phys_max_off = 79993765
                bof = 536
                eof = 77080
                records_lost = 0

The bof and eof values are logical byte positions in the ring buffer, and so at least eof will be larger than phys_max_off if you've started losing records. For more details, see the comments in spa_history.c.

solaris/ZFSZpoolHistorySizeLimit written at 01:28:05; Add Comment

(Previous 10 or go back to January 2017 at 2017/01/12)

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.