Modern ad networks are why adblockers are so effective
My entry on web adblockers and the Usenet killfile problem sparked a discussion on lobste.rs, and as part of that discussion I came to an obvious in retrospect realization about why adblockers are so effective and are going to stay that way. Namely, it's because of the needs of modern ad networks.
If an ad network wants to be big, it generally has to have a lot of websites that will display its ads. It can't afford to focus all its efforts on a small number of sites and work deeply with them; instead it needs to spread widely. If you want to spread widely, especially to small sites, you need to make it easy for those websites to put your ads on their pages, something simple for both them and you. Having the ads added to web pages by the browser instead of the web server is by far the easiest and most reliable approach for both the ad network and the websites.
(Adding the ads in the client is also kind of forced if you want both sophisticated real time ad bidding systems and web servers that perform well. If the web server's page assembly time budget is, say, 30 milliseconds, it isn't going to be calling out to even a single ad network API to do a 100 millisecond 'bid this ad slot out and give me an ad to serve' operation.)
Big sites can do artisanal content management systems that integrate ads into their native page content in their backends, serving the entire thing to you as one mostly indistinguishable mass (and they're big enough to make it worthwhile for ad networks to do custom things for them). But this is never going to happen for a lot of smaller sites, which leaves ad networks creating the environment that allows adblockers to flourish.
ZFS's potentially very useful '
zpool history -i' option
I recently wrote a little thing praising
zpool history. At the time I wrote that, I hadn't really
read the manpage carefully enough to have noticed an important
additional feature, which is
-i argument (and
-l as well, sometimes). To quote the manpage,
internally logged ZFS events in addition to user initiated events'.
What this means in plain language is that '
zpool history -i' shows
you a lot of what happened to your pool no matter how it was done.
This may sound irrelevant and abstract, so let me give you a concrete
Did you know that you can create and delete snapshots in a filesystem
rmdir in the <filesystem>/.zfs/snapshot
directory? If you have sufficient privileges (root is normally
required), this works both locally and over NFS to a ZFS fileserver. Snapshots created and deleted this way don't
show up in plain '
zpool history' because of course they weren't
created with a '
zfs' command, but they do show up in '
When you're looking at the output at this level, you will typically see three log events for a typical command:
<time> [txg:12227245] snapshot fs0-core-01/cs/mail@2017_01_10 (4350)<time> ioctl snapshot input: snaps: fs0-core-01/cs/mail@2017_01_10 props: <time> zfs snapshot fs0-core-01/cs/mail@2017_01_10
[txg:NNN] first line is the low-level internal log and is
apparently the only log entry that's guaranteed to be there, I
assume because it's written as part of the transaction; the remaining
records can be lost if the machine fails at the right time or the
program crashes, and they're written after the TXG record (as we
see here). The
ioctl entry tells us that this was a snapshot
operation initiated from user level through a ZFS ioctl. And the final line tells us
that this snapshot creation was done by the
(Much of this is from Matthew Ahrens of Delphix in the ZFS
developers mailing list,
and his message is (indirectly) how I found out about the
If this was a snapshot creation or deletion that had been done
rmdir, there would only be the
log entries (because obviously they use neither user-level ioctls
There seem to be any number of interesting internally logged ZFS events, but at this point I haven't really gone looking into this in any depth. I encourage people to look at this themselves for their own pools.
Picking FreeType CJK fonts for
xterm on a modern Linux system
Once I worked out how to make
xterm show Chinese, Japanese, and
Korean characters, I had to figure
out what font to use. I discussed the general details of using
FontConfig to hunt for CJK fonts in that entry, so now let's get down to details.
The Arch Linux
xterm example uses 'WenQuanYi
Bitmap Song' as its example CJK font. This is from the Wen Quan
Yi font collection, and
they're available for Fedora in a collection of wqy-*-fonts packages.
So I started out with 'WenQuanYi Zen Hei Mono' as the closest thing
that I already had installed on my system.
(Descriptions of Chinese fonts often talk about them being an 'X style' font. It turns out that Chinese has different styles of typography, analogous to how Latin fonts have serif and sans-serif styles; see here or here or here for three somewhat random links that talk about eg Heiti vs Mingti. Japanese apparently has a similar but simpler split, per here, with the major divisions being called 'gothic' and 'Mincho'. Learning this has suddenly made some Japanese font names make a lot more sense.)
Fedora itself has a Localization fonts requirements
wiki page. The important and useful bit of this page is a matrix
of language and the default and additional fonts Fedora apparently
prefers for it. Note that each of Chinese, Japanese, and Korean
pick different fonts here; there isn't one CJK font that's the
first or even second preference for all of them. Since you have to
pick only one font for
xterm's CJK font, you may want to think
about which language you care most about.
In Ubuntu, apparently some CJK default fonts have changed to
Google's Noto CJK family.
A discussion in that bug suggests that Fedora may also have changed
its defaults to the Noto CJK fonts, contrary to what its wiki sort of
implies. The Arch Wiki has its usual comprehensive list of CJK
and there's also Wikipedia's general list. Neither particularly
mentions monospaced fonts, though, assuming that this is even
something that one has to consider in CJK fonts for
All of this led me to peer into the depths of
on my Fedora machines to look for mentions of monospace. Here I
found interesting configuration file snippets that said things like:
<match> <test name="lang"> <string>ja</string> </test> <test name="family"> <string>monospace</string> </test> <edit name="family" mode="prepend"> <string>Noto Sans Mono CJK JP</string> </edit> </match> <alias> <family>Noto Sans Mono CJK JP</family> <default> <family>monospace</family> </default> </alias>
I'm not really up on FontConfig magic, but this sure looked like it was setting up a 'Noto Sans Mono CJK JP' font as a monospace font if you wanted things in Japanese. There's also KR, SC (Simplified Chinese), and TC (Traditional Chinese) variants of Noto Sans Mono CJK lurking in the depths of my Fedora system.
After looking at an
xterm using WenQuanYi Zen Hei Mono side by
side with one using Noto Sans Mono CJK JP, I decided that the Noto
version was probably better looking (on my very limited sample of
CJK text, mostly in file names and font names) and also I felt
slightly more confident in picking it, since it seemed more likely
to be closer to how eg
gnome-terminal was operating and also the
general trend of CJK font choices in various Linuxes. I wish I could
find out what CJK font(s)
gnome-terminal was using, but the
design of current versions makes that
(Some experimentation suggests that in my setup,
may be using VL Gothic here. I guess I can live with all of this,
however it comes out; mostly I just want CJK characters to show up
as something other than boxes or especially spaces.)
Making modern FreeType-using versions of
xterm display CJK characters
For a long time, my setup of
xterm has not displayed Chinese and
Japanese characters (or Korean ones, although I encounter those
less often). Until recently it displayed the Unicode 'no such
character' empty box in their place, which was okay and told me
that there were problems, but after my upgrade to Fedora 25 it
started showing spaces instead (or at least some form of whitespace).
This is just enough extra irritation that I've been pushed into
figuring out how to fix it.
I switched my
xterm setup from old style bitmapped fonts to new
style XFT/FreeType several years ago. It turns out
that enabling CJK fonts in this environment is actually quite simple,
as I found out from the Arch Linux wiki. All you
need to do is to tell
xterm what font to use for these characters
with either the
-fd command line argument or the
X resource (I recommend the latter, unless you already have a
frontend script for
Well, that elides a small but important detail, namely finding such
a font. Modern fonts tend to have a lot more glyphs and language
coverage than old fonts did, but common fonts like the monospaced
font I'm using for
xterm don't go quite as far as covering the
CJK glyphs; instead this seems to be reserved for special fonts
with extended ranges. Sophisticated systems like Gnome come magically
set up to pick the right font(s) in
gnome-terminal, but in
we're on our own to dig up a suitable font and I'm not quite sure
what the right way to do that is.
As far as I know, fontconfig can be used to show us a list of fonts
that claim to support a language, for example with '
:lang=zh-cn family'. A full list of things you can query for is
a more useful query may be '
family', which excludes all of the bold and italic and so on
(It looks like you can find out fonts that include a specific Unicode
character by querying for '
What I don't know is whether
xterm requires its CJK font to be
monospaced (I suspect it does if you want completely correct
rendering) and if so, how you tell if any specific font is monospaced
in its CJK glyphs. When I ask for '
I get no fonts, although there are CJK fonts with 'Mono' in their
names on my system and I'm using one of them in
explosions so far. It may be that CJK fonts with 'Mono' in their
name are monospaced in their CJK glyphs even if they are not
monospaced in all glyphs. But then there is eevee's exploration
which suggests that 'monospace' in fontconfig is actually kind of
(The other thing I don't know how to do for
xterm is set things
up if you need multiple fonts in order to get full coverage of the
CJK glyphs, possibly in a genuinely monospaced font. This is
especially interesting because Google's Noto Sans fonts have a
collection of 'Noto Sans Mono CJK <language>' fonts. There appears
to be overlap between them, but it's not clear if you need to stich
up one (double-width) font for
xterm out of them all or some
One downside of a queued IO model is memory consumption for idle connections
One of the common models for handling asynchronous IO is what I'll call the queued IO model, where you put all of your IO operations in some sort of a queue and then as things become ready, the OS completes various ones and hands them back to you. Sometimes this queue is explicitly exposed and sometimes, as in Go, the queue is implicit in a collection of threads all doing (what they see as) blocking IO operations. The queued IO model is generally simple and attractive, either in threaded form (in Go) or in explicit form where you pass operations that you'd like to do to the OS and it notifies you when various ones finish.
Recently I wound up reading Evan Klitzke's Goroutines, Nonblocking
I/O, And Memory Usage, which
pointed out a drawback to this model that hadn't been obvious to
me before. That drawback is memory usage for pending operations,
especially reads, in a situation where you have a significant number
of idle connections. Suppose that you have 1,000 connections where
you're waiting for the client to send you something. In a queued
IO model the normal way to operate is to queue 1,000 read operations,
and each of these queued read operations must come with an allocated
buffer for the operating system to write the read data into. If
only (say) 5% of those connections are active at any one time, you
have quite a lot of memory tied up in buffers that are just sitting
around inactive. In a
select() style model that exposes readiness
before you perform the IO, you can only allocate buffers when you're
actually about to read data.
Writes often pre-compute and pre-allocate the data to be written, in which case this isn't much of an issue for them; the buffer for the data to be written has to be allocated beforehand either way. But in situations where the data to be written could be generated lazily on the fly, the queued IO model can once again force extra memory allocations where you have to allocate and fill buffers for everything, not just the connections that are ready to have more data pushed to them.
All of this may be obvious to people already, but it was surprising to me so I feel like writing it down, especially how it extends from Go style 'blocking IO with threads' to the general model of queuing up asynchronous IO operations for the kernel to complete for you as it can.
(Of course there are reasons to want a
select() like interface
beyond this issue, such as the cancellation problem.)
How ready my Firefox extensions are for Firefox Electrolysis
Firefox Electrolysis is Mozilla's push to improve Firefox by making it multiprocess, but this requires a fundamental change in how Firefox extensions interact with Firefox. Mozilla is strongly pushing Electrolysis in 2017 and as part of that is strongly working on deprecating the old (current) Firefox extensions API. Their current schedule entirely abandons old extensions by roughly December of this year (2017), with Firefox 57. Mozilla has made available an extension, Add-on Compatibility Reporter, that can tell you if your extensions are compatible with the modern way of doing things. This is a lot more convenient than going through arewee10syet, so I've decided to write down the state of my regular set of extensions (note that I now use uBlock Origin) and my essential set for basic profiles.
In my essential set, things are simple. FireGestures and uBlock Origin are compatible, but Self-Destructing Cookies is not. That's not great news; SDC is an excellent low-friction way of defending myself against the obnoxious underbrush of cookies. I can't see any sign on SDC's page that an update to add Electrolysis compatibility is in progress, although it might be something that's quietly being worked on.
In my main browser with my regular set of extensions, well, things get mixed:
- NoScript is compatible (as are FireGestures and uBlock Origin).
In less core extensions, so is HTTPS Everywhere and even CipherFox
(which I could really uninstall or disable at this point without
- my current cookie management extension, CS Lite Mod, is not
compatible but I already knew it was obsolete, not getting updates,
and going to have to be replaced someday. It's not clear if there's
a good Electrolysis compatible cookie blocking extension yet,
though (especially one that doesn't leak memory, which has been
a problem in my earlier attempts to find a replacement).
- FlashStopper is not compatible. Stopping video autoplay on Youtube
is not really something I consider negotiable, but in theory
NoScript might start working for this. In addition, the
'development channel' part of the addon's page
suggests that a compatible version is in progress (see also).
- It's All Text
is not compatible. That's going to hurt a bunch, especially for
writing comments on Wandering Thoughts. There's an
open issue for it
but it's been open since 2015 and apparently the original developer
doesn't do much with Firefox any more (see also).
- Open in Browser is not compatible. I like OiB but it's not a core part of my browser experience the way some of my extensions are. There's an open issue in the project about this.
Without a cookie management extension and with uncertainty about others updating in time (especially since I generally follow Firefox's development version, effectively Firefox Nightly), my most likely action is to simply not update to a version of Firefox that drops support for old extensions.
(The Firefox release calendar suggests that the development version will stop supporting old extensions sometime in June or July, so I really don't have all that much time left.)
Web adblockers and the potential for recreating the Usenet killfile problem
Here is a rambling thought.
Back in the days of Usenet, most Usenet readers supported 'killfiles' for filtering your view of a newsgroup. As newsgroup after newsgroup descended into noise, the common reaction of people was to get more and more elaborate killfiles so they could preserve what they could. The long term problem with this was that new readers of a newsgroup generally had no killfiles, so they generally took one look at the unfiltered version and left.
If you've recently compared the versions of the web you see with and without your adblocker, you may be thinking that this last bit sounds familiar. Increasingly, the raw web is simply an unpleasant place, with more and more things shoving their way in front of your face. Although there are other reasons to block ads, such as keeping your machine safe and reducing data usage, my belief is that a lot of people turn to adblockers in large part to get this clutter out of their face.
So, what happens if adblocking becomes more and more common over time? I suspect that one response from websites will be to run more ads than ever before in an attempt to generate more revenue from the steadily decreasing number of people who are actually seeing ads. If this happens, the result will be to make the raw, adblocker-free Internet an increasingly shitty place. Generally this will be the version of the Internet that new people are exposed to, since new people are quite likely to start out without an adblocker in their browser.
(Browser vendors or system vendors preinstalling adblockers would be a drastic change and would probably provoke lawsuits and other explosions.)
At this point I run out of sensible speculation, so I'm writing this mostly to note the similarity I see in the mechanisms involved. In the spirit of fairness, here's some differences as well:
- people don't necessarily have good alternatives to ad-laden
websites, so maybe they'll just live with the terrible experience
(certainly plenty of websites seem to be betting on this).
- it's getting so that everyone knows about adblockers and it's generally quite easy to install one and start getting a good Internet experience (unlike the experience with Usenet killfiles, which were as if everyone had to write their own adblocker rules).
And, of course, the web could always explode, rendering the whole issue moot.
Mail submission by users versus by your machines
Generally the simplest way to handle making sure that system email works is to set your machines up with a 'null client' mailer configuration that dumps all of their outgoing mail on a single designated smarthost machine. When you are setting this up, it is natural to use the same machine and mailer configuration that you have for users, with the exception that you don't require your machines to do SMTP authentication before they send email.
That 'with the exception' I threw in there is the sign of the important way that mail submission by users is significantly different than mail submission by your fleet of machines. To put it simply, you want to accept all mail from your fleet of machines while rejecting at least a certain amount of email from your users, because they are different sorts of senders.
If one of your machines tries to send out an email message, you almost certainly want to see it. It doesn't matter what envelope sender address it has (or even if that envelope sender is kind of broken) and it mostly doesn't matter what the destination is, including whether or not it exists in your mail system; it should all get accepted and then generally all wind up in your mailbox or perhaps your monitoring system.
By contrast, if one of your users tries to send email with many sorts of broken configurations (such as an invalid or wrong sender address), you want it to not get accepted. Indeed, certain sorts of user problems can only be handled by rejecting the message at SMTP time. And often you'll want to partially or completely force the use of SMTP authentication, so that you know it actually is one of your users that's sending email and not some random person on the Internet.
If you have some system that insures
logins and passwords are reasonably in sync on all of your machines,
you can often get away with treating a normal mail submission machine
as a machine mail submission machine. Most of the time machines
will send email from addresses (such as
root) that do exist in
your global environment, and will be sending to system addresses
that you can intercept. When they aren't and you know about it,
you can deploy hacks like manually adding valid
addresses to your global system (so that eg
postgres is a valid
address that goes to you). But it takes some attention and the
result is not perfect.
It's possible to do better, but you need more complicated mailer configurations on one or both of your fleet of client machines and your central mail collection point. What I currently think of as an ideal setup would likely require special configuration on both sides.
(We use the 'single mail submission machine' version of this approach currently, partly because almost all of the machines that send out email are part of our password distribution system. Also, when a single server mixes user-generated email and machine-generated email, you generally have to treat it as all user-generated email.)
Make sure that (system) email works on every machine
We have a central log server, which as you
might imagine is a machine we care about a fair bit. Today we
discovered that one of the disks in its software RAID mirror for
/var had failed. Perhaps you are thinking that it failed over the
university's just-ended winter break, so let me be more honest and
precise here: it had failed in late October. And we didn't so much
'find' that the disk had failed as stumble over the fact more or
less through luck.
We didn't miss the news because the machine wasn't sending out
notifications of it. The machine's mdadm setup had dutifully sent
out email about it several times. It's just that the email hadn't
gone anywhere except to the local
we hadn't set up a null-client Postfix configuration on it. There
are multiple causes for that, but I'm sure that one of them is that
it simply slipped our mind that the machine might generate important
(The central log server is a deliberately isolated one-off CentOS 7 machine, instead of one of our standard Ubuntu installs. Our standard Ubuntu machines automatically get a null-client Postfix configuration that sends all locally generated email to our mail submission machine and to us, but there's nothing that automatically sets that up for one-off CentOS 7 machines so it dropped through the cracks.)
There are a number of lessons here. The most immediately useful is make sure that the mail system is configured on all your machines. All of them. If something generates email on a machine, however unlikely that may seem to you, that email should not effectively vanish into the void; it should go somewhere where you'll at least have a record of it.
(There is an argument that you should have a better monitoring
system for problems like this than reading email. Sure, in an ideal
world, but systems come out of the box set up to send email to
root right now. And even with a better monitoring system there
are still unusual problems that will be reported by email, such as
cron jobs exploding spectacularly. Handling email as a backup is
just the simplest way.)
We aren't perfect on this, but at least now our central syslog server (and a couple of other similar systems) will have its mail get through to us.
(There are some tricky parts about doing this really well that we aren't currently doing. To do it perfectly you need a separate submission configuration from your regular machines, but that's sufficiently complicated that it's another entry.)
Sidebar: How we found this
We were applying the recent CentOS 7 updates to the machine, and
after the '
yum update' finished, the shell gave us that classic
You have new mail in /var/spool/mail/root
We wondered what on the machine would be sending email, so we took
a look at root's mailbox. It turned out that one of the updated
mdadm and updating it had restarted its monitoring,
which had caused it to send out another email about the degraded
A lot of things had to go right in order for us to be lucky here. One moral to draw is take a look at oddities, like surprise notices about new mail. They may be the surface manifestation of something quite important.
Software should support configuring overall time limits
It's pretty common for software to support setting various sorts
of time limits on operations, often in extensive detail. You can
often set retry counts and so on as well. All of this is natural
because it generally maps quite well to the low level operations
that the software itself can set internal limits on, so you get
things like the OpenSSH client
ConnectTimeout setting, which
basically controls how long
ssh will wait for its
system call to succeed.
More and more, I have come to feel that this way of configuring time limits is not as helpful in real life as you might think, and yesterday's events provide a convenient example for why. There are several problems. First, low level detailed time limits, retry counts, and so on don't particularly correspond to what you often really want, namely a limit on how long the entire high level operation can take. We now want to put a limit on the total maximum IO delay that ZFS can ever see, but there's no direct control for that, only low-level ones that might do this indirectly if we can find them and sort through all of the layers involved.
Second, the low level limits can interact with each other in ways that are hard to see in advance and your actual timeouts can wind up (much) higher than you think. This is especially easy to have happen if you have multiple layers and there are retries involved. People who deal with disk subsystems have probably seen many cases where the physical disk retries a few times and then gives up, then the OS software tries a few times (each of which provokes another round of physical disk retries), and so on. Each of these layers might be perfectly sensible if it was the only layer in action, but put them all together and things go out to lunch and don't come back.
Third, it can actually be impossible to put together systems that are reliable and that also have reliable high level time limits given only low level time and retry controls. HDs are a good example of this. Disk IO operations, especially writes, can take substantial amounts of time to complete under load in normal operation (over 30 seconds). And some of the time retrying a failed operation at least once will cause it to succeed, because the failure was purely a temporary fluctuation and glitch. But the combination of these settings, each individually necessary, can give you a too-high total timeout, leaving you with no good choice.
(Generally you wind up allowing too high total timeouts, because the other option is to risk a system that falls apart explosively under load as everything slows down.)
Real support for overall time limits requires more code, since you actually have to track and limit the total time operations take (and you may need to abort operations in mid flight when their timer expires). But it is often quite useful for system administrators, since it lets us control what we often really care about and need to limit. Life would probably be easier right now if I could just tell the OmniOS scsi_vhci multipathing driver to time out any IO that takes more than, say, five minutes and return an error for it.
(Of course this points out that you may also want the low level limits too, either exposed externally or implemented internally with sensible values. If I'm going to tell a multipathing driver that it should time out IO after five minutes, I probably want to time out IOs to individual paths faster than that so the driver has time to retry an IO over an alternate path.)
PS: Extension of this idea to other sorts of low level limits is left as an exercise for the reader.