2018-08-26
A little bit of the one-time MacOS version still lingers in ZFS
Once upon a time, Apple came very close to releasing ZFS as part of MacOS. Apple did this work in its own copy of the ZFS source base (as far as I know), but the people in Sun knew about it and it turns out that even today there is one little lingering sign of this hoped-for and perhaps prepared-for ZFS port in the ZFS source code. Well, sort of, because it's not quite in code.
Lurking in the function that reads ZFS directories to turn (ZFS) directory entries into the filesystem independent format that the kernel wants is the following comment:
objnum = ZFS_DIRENT_OBJ(zap.za_first_integer); /* * MacOS X can extract the object type here such as: * uint8_t type = ZFS_DIRENT_TYPE(zap.za_first_integer); */
(Specifically, this is in zfs_readdir in zfs_vnops.c .)
ZFS maintains file type information in directories. This information can't be used on Solaris
(and thus Illumos), where the overall kernel doesn't have this in
its filesystem independent directory entry format, but it could
have been on MacOS ('Darwin'), because MacOS is among the Unixes
that support d_type. The comment
itself dates all the way back to this 2007 commit,
which includes the change 'reserve bits in directory entry for file
type', which created the whole setup for this.
I don't know if this file type support was added specifically to help out Apple's MacOS X port of ZFS, but it's certainly possible, and in 2007 it seems likely that this port was at least on the minds of ZFS developers. It's interesting but understandable that FreeBSD didn't seem to have influenced them in the same way, at least as far as comments in the source code go; this file type support is equally useful for FreeBSD, and the FreeBSD ZFS port dates to 2007 too (per this announcement).
Regardless of the exact reason that ZFS picked up maintaining file type information in directory entries, it's quite useful for people on both FreeBSD and Linux that it does so. File type information is useful for any number of things and ZFS filesystems can (and do) provide this information on those Unixes, which helps make ZFS feel like a truly first class filesystem, one that supports all of the expected general system features.
How ZFS maintains file type information in directories
As an aside in yesterday's history of file type information being available in Unix directories, I mentioned that it was possible for a filesystem to support this even though its Unix didn't. By supporting it, I mean that the filesystem maintains this information in its on disk format for directories, even though the rest of the kernel will never ask for it. This is what ZFS does.
(One reason to do this in a filesystem is future-proofing it against a day when your Unix might decide to support this in general; another is if you ever might want the filesystem to be a first class filesystem in another Unix that does support this stuff. In ZFS's case, I suspect that the first motivation was larger than the second one.)
The easiest way to see that ZFS does this is to use zdb to dump
a directory. I'm going to do this on an OmniOS machine, to make it
more convincing, and it turns out that this has some interesting
results. Since this is OmniOS, we don't have the convenience of
just naming a directory in zdb, so let's find the root directory
of a filesystem, starting from dnode 1 (as seen before).
# zdb -dddd fs3-corestaff-01/h/281 1
Dataset [....]
[...]
microzap: 512 bytes, 4 entries
[...]
ROOT = 3
# zdb -dddd fs3-corestaff-01/h/281 3
Object lvl iblk dblk dsize lsize %full type
3 1 16K 1K 8K 1K 100.00 ZFS directory
[...]
microzap: 1024 bytes, 8 entries
RESTORED = 4396504 (type: Directory)
ckstst = 12017 (type: not specified)
ckstst3 = 25069 (type: Directory)
.demo-file = 5832188 (type: Regular File)
.peergroup = 12590 (type: not specified)
cks = 5 (type: not specified)
cksimap1 = 5247832 (type: Directory)
.diskuse = 12016 (type: not specified)
ckstst2 = 12535 (type: not specified)
This is actually an old filesystem (it dates from Solaris 10 and
has been transferred around with 'zfs send | zfs recv' since then),
but various home directories for real and test users have been
created in it over time (you can probably guess which one is the
oldest one). Sufficiently old directories and files have no file
type information, but more recent ones have this information,
including .demo-file, which I made just now so this would have
an entry that was a regular file with type information.
Once I dug into it, this turned out to be a change introduced (or
activated) in ZFS filesystem version 2, which is described in 'zfs
upgrade -v' as 'enhanced directory entries'. As an actual change
in (Open)Solaris, it dates from mid 2007, although I'm not sure
what Solaris release it made it into. The upshot is that if you
made your ZFS filesystem any time in the last decade, you'll have
this file type information in your directories.
How ZFS stores this file type information is interesting and clever,
especially when it comes to backwards compatibility. I'll start by
quoting the comment from zfs_znode.h:
/* * The directory entry has the type (currently unused on * Solaris) in the top 4 bits, and the object number in * the low 48 bits. The "middle" 12 bits are unused. */
In yesterday's entry I said that Unix directory entries need to store at least the filename and the inode number of the file. What ZFS is doing here is reusing the 64 bit field used for the 'inode' (the ZFS dnode number) to also store the file type, because it knows that object numbers have only a limited range. This also makes old directory entries compatible, by making type 0 (all 4 bits 0) mean 'not specified'. Since old directory entries only stored the object number and the object number is 48 bits or less, the higher bits are guaranteed to be all zero.
(It seems common to define DT_UNKNOWN to be 0; both FreeBSD
and Linux do it.)
The reason this needed a new ZFS filesystem version is now clear. If you tried to read directory entries with file type information on a version of ZFS that didn't know about them, the old version would likely see crazy (and non-existent) object numbers and nothing would work. In order to even read a 'file type in directory entries' filesystem, you need to know to only look at the low 48 bits of the object number field in directory entries.
(As before, I consider this a neat hack that cleverly uses some properties of ZFS and the filesystem to its advantage.)
2018-08-25
The history of file type information being available in Unix directories
The two things that Unix directory entries absolutely have to have
are the name of the directory entry and its 'inode', by which we
generically mean some stable kernel identifier for the file that
will persist if it gets renamed, linked to other directories, and
so on. Unsurprisingly, directory entries have had these since the
days when you read the raw bytes of directories with read(), and for a long time that was all they had; if you
wanted more than the name and the inode number, you had to stat()
the file, not just read the directory. Then, well, I'll quote
myself from an old entry on a find optimization:
[...], Unix filesystem developers realized that it was very common for programs reading directories to need to know a bit more about directory entries than just their names, especially their file types (
findis the obvious case, but also consider things like 'ls -F'). Given that the type of an active inode never changes, it's possible to embed this information straight in the directory entry and then return this to user level, and that's what developers did; on some systems,readdir(3)will now return directory entries with an additionald_typefield that has the directory entry's type.
On Twitter, I recently grumbled about Illumos not having this
d_type field.
The ensuing conversation wound up with me curious about exactly where
d_type came from and how far back it went. The answer turns out to
be a bit surprising due to there being two sides of d_type.
On the kernel side, d_type appears to have shown up in 4.4 BSD.
The 4.4 BSD /usr/src/sys/dirent.h
has a struct dirent that has a d_type field, but the field
isn't documented in either the comments in the file or in the
getdirentries(2) manpage; both of those admit only to the traditional
BSD dirent fields. This 4.4 BSD d_type was carried through
to things that inherited from 4.4 BSD (Lite), specifically FreeBSD,
but it continued to be undocumented for at least a while.
(In FreeBSD, the most convenient history I can find is here, and the d_type field
is present in sys/dirent.h as far back as FreeBSD 2.0, which
seems to be as far as the repo goes for releases.)
Documentation for d_type appeared in the getdirentries(2)
manpage in FreeBSD 2.2.0, where the manpage itself claims to have
been updated on May 3rd 1995 (cf).
In FreeBSD, this appears to have been part of merging 4.4 BSD
'Lite2', which seems to have been done in 1997. I stumbled over a
repo of UCB BSD commit history,
and in it the documentation appears in this May 3rd 1995 change,
which at least has the same date. It appears that FreeBSD 2.2.0 was
released some time in 1997, which is when this would have appeared
in an official release.
In Linux, it seems that a dirent structure with a d_type member
appeared only just before 2.4.0, which was released at the start
of 2001. Linux took this long because the d_type field only
appeared in the 64-bit 'large file support' version of the
dirent structure, and so was only return by the new 64-bit
getdents64() system call. This would have been a few years after
FreeBSD officially documented d_type, and probably many years
after it was actually available if you peeked at the structure
definition.
(See here for an overview of where to get ancient Linux kernel history from.)
As far as I can tell, d_type is present on Linux, FreeBSD,
OpenBSD, NetBSD, Dragonfly BSD, and Darwin (aka MacOS or OS X).
It's not present on Solaris and thus Illumos. As far as other
commercial Unixes go, you're on your own; all the links to manpages
for things like AIX from my old entry on the remaining Unixes appear to have rotted away.
Sidebar: The filesystem also matters on modern Unixes
Even if your Unix supports d_type in directory entries, it
doesn't mean that it's supported by the filesystem of any specific
directory. As far as I know, every Unix with d_type support has
support for it in their normal local filesystems, but it's not
guaranteed to be in all filesystems, especially non-Unix ones like
FAT32. Your code should always be prepared to deal with a file type
of DT_UNKNOWN.
(Filesystems can implement support for file type information in directory entries in a number of different ways. The actual on disk format of directory entries is filesystem specific.)
It's also possible to have things the other way around, where you have a filesystem with support for file type information in directories that's on a Unix that doesn't support it. There are a number of plausible reasons for this to happen, but they're either obvious or beyond the scope of this entry.
2018-08-24
Incremental development in Python versus actual tests
After writing yesterday's entry on how I need to add tests to our Djanog app, I've been thinking about why it doesn't have them already. One way to describe the situation is that I didn't bother to write any tests when I wrote the app, but another view is that I didn't write tests because I didn't need to. So let me explain what I mean by that.
When I ran into the startup overhead of small Python programs, my eventual solution was to write a second implementation in Go, which was kind of an odd experience (as noted). One of the interesting differences between the two versions is that the Go version has a fair number of tests and the Python one doesn't have any. There are a number of reasons for this, but one of them is that in Go, tests are often how you interact with your code. I don't mean that philosophically; I mean that concretely.
In Python, if you've written some code and you want to try it out
to see if it at least sort of works, you fire up the Python
interpreter, do 'import whatever' (even for your main program), and start poking away. In Go, you have no REPL,
so often the easiest way to poke at some new code is to open up
your editor, write some minimal code that you officially call a
'test', and run 'go test' to invoke it (and everything else you
have as a test). This is more work than running an interactive
Python session and it's much slower to iterate on 'what happens if
...' questions about the code, but it has the quiet advantage that
it's naturally persistent (since it's already in a file).
This is the sense in which I didn't need tests to write our Django app. As I was coding, I could use the Python REPL and then later all of Django's ready to go framework itself to see if my code worked. I didn't have to actually write tests in order to test my code, not in the way that you can really need to in Go. In Python, incremental development can easily be done with all of your 'tests' being ad hoc manual work that isn't captured or repeatable.
(Even in Go, my testing often trails off dramatically the moment I have enough code written that I can start running a command to exercise things. In the Go version of my Python program, basically all of the tests are for low-level things and I 'tested' to see if higher level things worked by running the program.)
PS: Django helps this incremental development along by making it
easy to connect bits of your code up to a 'user interface' in the
form of a web page. You need somewhat more than a function call in
the REPL but not much more, and then you can use 'manage runserver
...' to give you a URL you can use to poke your code, both to
see things rendered and to test form handling. And sometimes you
can check various pieces of code out just through the Django admin
interface.
PPS: Of course it's better to do incremental development by writing actual tests. But it takes longer, especially if you don't already know how to test things in the framework you're using, as I didn't when I was putting the app together (cf).
2018-08-23
It's time for me to buckle down and add tests to our Django web app
Our Django web app is the Python 2 code that I'm most concerned about in a world where people are trying to get rid of Python 2, for two reasons. First, not only do we have to worry about Python 2 itself being available but also the Django people have been quite explicit that Django 1.11 is the last version that supports Python 2 and the Django people will stop supporting it in 2020. We probably don't want to be using an unsupported web framework. Second, it's probably the program that's most exposed to character set conversion issues, simply because that seems to be in the nature of things that deal with the web, databases, and so on. In short, we've got to convert it to Python 3 sometime, probably relatively soon, and it's likely to be more challenging than other conversions we've done.
One of the things that would make a Python 3 conversion less challenging is if we had (automated) tests for the app, ideally fairly comprehensive ones. Having solid tests for your code is best practices for a Python 3 conversion for good reasons, and they'd also probably help with things like our Django upgrade pains. Unfortunately we've never had them, which was something I regretted in 2014 and hasn't gotten any better since then, because there's never been a time when adding tests was either a high enough priority or something that I was at all enthused about doing.
(One of my reasons for not feeling enthusiastic is that I suspect that trying to test the current code would lead me to have to do significant revisions on it in order to make it decently testable.)
Looking at our situation, I've wound up feeling that it's time for this to change. Our path forward with the Django app should start with adding tests, which will make both Python 3 and future Django upgrades (including to 1.11 itself) less risky, less work, and less tedious (since right now I do all testing by hand).
(Hopefully adding tests will have other benefits for future development and so on, but some of these are contingent on additional factors beyond the scope of this entry.)
Unfortunately, adding tests to this code is likely to feel like make-work to me, and in a sense it is; the code already works (yes, as far as we know), so all that tests do is confirm that it does. I have no features to add, so I can't add tests to cover the new features as I add them; instead, this is going to have to be just grinding out tests for existing code. Still, I think it needs to be done, and the first step for doing it is for me to learn how to test Django code, starting by reading the documentation.
(This entry is one of the ones that I'm writing in large part as a
marker in the ground for myself, to make it more likely that I'll
actually carry through on things. This doesn't always work; I still
haven't actually studied dpkg and apt, despite declaring I
was going to two years ago and
having various tabs open on documentation since then. I've even
read bits of the documentation from time to time, and then all of
the stuff I've read quietly falls out of my mind again. The useful
bits of dpkg I've picked up since then have come despite this
documentation, not because of it. Generally they come from me having
some problem and stumbling over a way to fix it. Unfortunately, our
problems with our Django app, while real, are also big and diffuse
and not immediate, so it's easy to put them off.)
2018-08-22
Why ed(1) is not a good editor today
I'll start with my tweet:
Heretical Unix opinion time: ed(1) may be the 'standard Unix editor', but it is not a particularly good editor outside of a limited environment that almost never applies today.
There is a certain portion of Unixdom that really likes ed(1), the 'standard Unix editor'. Having actually used ed for a not insignificant amount of time (although it was the friendlier 'UofT ed' variant), I have some reactions to what I feel is sometimes overzealous praise of it. One of these is what I tweeted.
The fundamental limitation of ed is that it is what I call an
indirect manipulation interface,
in contrast to the explicit manipulation interfaces of screen editors
like vi and graphical editors like sam (which
are generally lumped together as 'visual' editors, so called because
they actually show you the text you're editing).
When you edit text in ed, you have some problems that you don't
have in visual editors; you have to maintain in your head the context
of what the text looks like (and where you are in it), you have to
figure out how to address portions of that text in order to modify
them, and finally you have to think about how your edit commands
will change the context. Copious use of ed's p command can help
with the first problem, but nothing really deals with the other
two. In order to use ed, you basically have to simulate parts of
ed in your head.
Ed is a great editor in situations where the editor explicitly presenting this context is a very expensive or outright impossible operation. Ed works great on real teletypes, for example, or over extremely slow links where you want to send and receive as little data as possible (and on real teletypes you have some amount of context in the form of an actual printout that you can look back at). Back in the old days of Unix, this described a fairly large number of situations; you had actual teletypes, you had slow dialup links (and later slow, high latency network links), and you had slow and heavily overloaded systems.
However, that's no longer the situation today (at least almost all of the time). Modern systems and links can easily support visual editors that continually show you the context of the text and generally let you more or less directly manipulate it (whether that is through cursoring around it or using a mouse). Such editors are easier and faster to use, and they leave you with more brainpower free to think about things like the program you're writing (which is the important thing).
If you can use a visual editor, ed is not a particularly good editor to use instead; you will probably spend a lot of effort (and some amount of time) on doing by hand something that the visual editor will do for you. If you are very practiced at ed, maybe this partly goes away, but I maintain that you are still working harder than you need to be.
The people who say that ed is a quite powerful editor are correct; ed is quite capable (although sadly limited by only editing a single file). It's just that it's also a pain to use.
(They're also correct that ed is the foundation of many other things in Unix, including sed and vi. But that doesn't mean that the best way to learn or understand those things is to learn and use ed.)
This doesn't make ed a useless, vestigial thing on modern Unix, though. There are uses for ed in non-interactive editing, for example. But on modern Unix, ed is a specialized tool, much like dc. It's worth knowing that ed is there and roughly what it can do, but it's probably not worth learning how to use it before you need it. And you're unlikely to ever be in a situation where it's the best choice for interactive editing (and if you are, something has generally gone wrong).
(But if you enjoy exploring the obscure corners of Unix, sure, go for it. Learn dc too, because it's interesting in its own way and, like ed, it's one of those classical old Unix programs.)
2018-08-20
Explicit manipulation versus indirect manipulation UIs
One of the broad splits in user interfaces in general is the spectrum between what I'll call explicit manipulation and indirect manipulation. Put simply, in a explicit manipulation interface you see what you're working on and you specify it directly, and in an indirect manipulation interface you don't; you specify it indirectly. The archetypal explicit manipulation interface is the classical GUI mouse-based text selection and operations on it; you directly select the text with the mouse cursor and you can directly see your selection.
(This directness starts to slip away once your selection is large enough that you can no longer see it all on the screen at once.)
An example of an indirect manipulation interface is the common
interactive Unix shell feature of !-n, for repeating (or getting
access to) the Nth previous command line. You aren't directly
pointing to the command line and you may not even still have it
visible on the screen; instead you're using it indirectly, through
knowledge of what relative command number it is.
A common advantage of indirect manipulation is that indirect
manipulation is compact and powerful, and often fast; you can do
a lot very concisely with indirect manipulation. Typing '!-7 CR'
is unquestionably a lot faster than scrolling back up through a
bunch of output to select and then copy/paste a command line. Even
the intermediate version of hitting cursor up a few times until the
desired command appears and then CR is faster than the full scale
GUI text selection.
(Unix shell command line editing features span the spectrum of
strong indirect manipulation through strong explicit manipulation;
there's the !-n notation, cursor up/down, interactive search, and
once you have a command line you can edit it in basically an explicit
manipulation interface where you move the cursor around in the line
to delete or retype or alter various bits.)
Indirect manipulation also scales and automates well; it's generally clear how to logically extend it to some sort of bulk operation that doesn't require any particular interaction. You specify what you want to operate on and what you want to do, and there you go. Abstraction more or less requires the use of indirect manipulation at some level.
The downside of indirect manipulation is that it requires you to
maintain context in order to use it, in contrast to explicit
manipulation where it's visible right in front of you. You can't
type '!-7' without the context that the command you want is that
one, not -6 or -8 or some other number. You need to construct and
maintain this context in order to really use indirect manipulation
effectively, and if you get the context wrong, bad things happen.
I have accidentally shut down a system by being confidently wrong
about what shell command line a cursor-up would retrieve, for
example, and mistakes about context are a frequent source of
production accidents like 'oops we just mangled the live database,
not the test one' (or 'oops we modified much more of the database
than we thought this operation would apply to').
My guess is that in much the same way that custom interfaces can be a benefit for people who use them a lot, indirect manipulation interfaces work best for frequent and ongoing users, because these are the people who will have the most experience at maintaining the necessary context in their head. Conveniently, these are the people who can often gain the most from using the compact, rapid power of indirect manipulation, simply because they spend so much time doing things with the system. By corollary, people who only infrequently use a thing are not necessarily going to remember context or be good at constructing it in their head and keeping track of it as they work (see also).
(The really great trick is to figure out some way to provide the power and compactness of indirect manipulation along with the low need for context of explicit manipulation. This is generally not easy to pull off, but in my view incremental search shows one path toward it.)
PS: I'm using 'user interface' very broadly here, in a sense that
goes well beyond graphical UIs. Unix shells have a UI, programs
have a UI in their command line arguments, sed and awk have a
UI in the form of their little languages, programming languages and
APIs have and are UIs, and so on. If people use it, it's in some
sense a user interface.
(I'd like to use the term 'direct manipulation' for what I'm calling 'explicit manipulation' here, but the term has an established, narrower definition. GUI direct manipulation interfaces are a subset of what I'm calling explicit manipulation interfaces.)
It's worth testing that obvious things actually do work
We've reached the point in putting together our future ZFS on Linux NFS fileservers where we believe we have everything built and now we're testing it to make sure that it works and to do our best to verify that there are no hidden surprises. In addition to the expected barrage of NFS client load tests and so on, my co-worker decided to verify that NFS locks worked. I would not have bothered, because of course NFS locks work, they are a well solved problem, and it has been many years since NFS locks (on Linux or elsewhere) had any chance of not working. This goes to show that my co-worker is smarter than I am, because when he actually tried it (using a little lock testing program that I wrote years ago), well:
$ ./locktest plopfile Press <RETURN> to try to get a flock shared lock on plopfile: Trying to get lock... flock lock failure: No locks available
With some digging we were able to determine that this was caused
by rpc.statd
not being started on our (Linux) fileserver. We're using NFS v3,
which requires some extra daemons to handle aspects of the (separate)
locking protocol, and presumably NFSv3 is unfashionable enough these
days that systems no longer bother to start them by default.
(Perhaps I'm making excuses for Ubuntu 18.04 here.)
Had we taken the fileserver into production without discovering this, the good news is that important things like our mail system would probably have failed safe by refusing the proceed without locks. But we would certainly have had a fun debugging experience, and under more stress than we did in testing. So I'm very glad that my co-worker was carefully thorough here.
The obvious moral I take from this is that it's worth testing that the obvious things do work. The obvious things are probably not broken in general (otherwise you would hopefully have heard about it during system research and design), but there's always the possibility of setup or configuration mistakes, or that you have a sufficiently odd system that you're falling into a corner case. You may not want to test truly everything, but it's certainly worth testing important but obvious things, such as NFS locking.
(There's also the unpleasant possibility that you've wound up with some fundamental misunderstanding about how the system is designed to work. This is going to force some big changes, but it's better to find this out before you try to take your mistake into production, rather than afterward as things are exploding.)
How much and how thoroughly you test in general depends on your resources and the importance of what you're doing. Some places might find and run a test suite that verified that their new NFS fileservers were delivering full POSIX compatibility (or as much as you can on NFS in general), for example. Making a point of testing the obvious is only an issue if you're only going to do partial tests, and so you might otherwise be tempted to skip the 'it's so obvious it must work' bits in the interests of time.
You may also want to skip explicitly testing the obvious in favour of doing end to end tests that will depend on the obvious working. For example, we might set up an end to end test of mail delivery and (IMAP) mail reading, and if we had, that would almost certainly have discovered the locking issue. There are trade-offs involved in each level of testing, of course.
(The short version is that end to end testing can tell you that it works but it can't tell you why, and it can be dangerous to infer that why yourself. If you actually want a low level functionality test, do the test directly.)
Sidebar: The smoking gun symptom
The fileserver's kernel logs had a bunch of messages reporting:
lockd: cannot monitor <host>
This comes from kernel code that attempts to make an upcall to
rpc.statd, which led us to look at ps to make sure that rpc.statd
was there before we went digging further.
2018-08-19
Why I'm mostly not interest in exploring new fonts (on Unix)
Every so often, an enthusiasm for some new font or set of fonts goes around the corners of the Internet that I pay attention to. It's especially common for monospaced fonts (programmers seem to have a lot of opinions here), but people get enthused by proportional ones as well. I used to be reasonably interested in this area and customized my fonts and sometimes my font rendering, but these days I find that I generally take at most a cursory look at new fonts and tune out.
The first reason I tune out is limited Unicode glyph coverage. As
I found out when I dug into it, xterm doesn't require your
primary font to have CJK glyphs, but as far
as I know it does require your font to have most anything else,
including Cyrillic, Arabic, and Hebrew glyphs, if you're not going
to see just 'glyph missing' boxes. I don't necessarily look at stuff
in these languages in my xterms all that often, but it comes up
every so often and I'd like to have things available in any new font,
since I pretty much do currently.
(I have to admit that the most common place these character sets come up is when I'm looking at spam messages, email and otherwise.)
For proportional fonts, things are less clear to me. Firefox apparently still uses glyphs from multiple fonts as necessary (based on eg this bug), even with modern XFT/FreeType fonts. I believe that Gnome and probably KDE have some sort of system for this, which sometimes matters for programs like Liferea (syndication feeds) and Corebird (Twitter), but I have no idea how it all works or is controlled. If all of this works, it means that my nominal main proportional font wouldn't need complete glyph coverage, although I might have to do a bunch of configuration tweaking.
(Back in the pre-XFT days, Firefox used to have a quite complex and arcane system for finding glyphs to render, since X fonts often had very small coverage. If things went wrong this could produce rather bizarre results; Firefox was perfectly capable of switching fonts around in the middle of lines or words, even in nominally European text, if it decided that for some reason your primary font didn't have the necessary character and it had to fish it out from some other, very different font. All of this is fortunately behind us and I've mercifully forgotten most of the details.)
All of this matters because most alternate fonts don't have a very wide glyph coverage. This isn't surprising; drawing a lot of glyphs is a lot of work, and creating glyphs for non-Latin characters requires its own set of type design expertise in how those characters are supposed to look. In practice most artisanal new fonts are likely to be pretty much confined to Latin and near-Latin characters.
The bigger reason that I tune out these days is that standard Unix XFT fonts and font display are now pretty good. Back in the old pre-XFT days, X fonts (bitmapped or TrueType) were generally pretty limited and often not of particularly high quality, and rendering decisions could be questionable or oriented at hardware that you didn't have. This could make it very worthwhile to seek out an alternate font, because it could significantly improve your reading experience. I used an alternate bitmapped font in Firefox for years, and even after that stopped being a good idea I switched to an alternate TrueType font (one or the other is visible in this view of my desktop, although I no longer remember if it was the bitmapped 'Garamond' font or TTF Georgia).
The modern XFT world is much different. The current fonts that are mostly used and shipped by major desktop environments and Linux distributions are pretty well designed and have increasingly good glyph coverage (primarily this is DejaVu), and if you don't like them, there are alternates (eg Google's Noto). I've also come around to finding that the rendering decisions are generally good, which is a change from the past for me.
Given that the default fonts are already pretty good, the incremental improvement I might gain through fiddling around with my fonts doesn't appear to be worth the effort and the uncertainty. I could spend a lot of time tinkering, trying out various fonts, and basically wind up with something that was subtly worse because, to be honest, I don't know what I'm actually doing here (and I know enough to know that type design and typography is complex). So the easy path is just to use the defaults.
(This entry is sparked by reading vermaden's FreeBSD Desktop – Part 15 – Configuration – Fonts & Frameworks (via), and thinking about why I had no interest in doing something similar.)
Sidebar: Why Unix XFT fonts and font rendering got good
My impression is that part of it is simply the progress of time creating a slow but steady improvement, but a lot of it is that Unix/Linux font rendering became important enough that people spent money on it. Google obviously spent the money to develop Noto, but they're far from the only people. Sometimes this funding came explicitly, by commissioning font work and so on, and sometimes it came passively, where companies like Red Hat and Canonical hired people and had them spend (work) time on Unix font rendering.
(Android probably helped motivate Google and other parties here.)
2018-08-17
Some malware apparently believes in covering its bases
Today our system for logging email attachment type information caught something interesting. Here's the important log messages:
<MSGID> attachment application/rtf; MIME file ext: .rtf <MSGID> attachment application/zip; MIME file ext: .zip; zip exts: .pdf .rtf .xlsx rejected <MSGID> from 185.185.25.104/luizhenrique@vencetudo.ind.br to <redacted>: identified virus: CXmail/Rtf-E, Exp/20180802-B
Exp/20180802-B is apparently an OLE2 based exploit using CVE-2017-11882, which appears to often be RTF-based (cf). This opens up the interesting and amusing possibility that both attachments are RTF based attacks (with the .pdf and .xlsx included in the .zip as either cover or supporting elements), and perhaps that they're the same RTF file. At the very least, this malware seems to believe in covering its bases; maybe you'll open a direct RTF attachment, or maybe you'll unzip the ZIP archive and use something in that.
We actually got several copies of this to various different local addresses, all apparently coming directly from this IP address (ie with no additional Received: headers) and all with the same 'Subject: Payment Advice'. The IP address in question isn't currently in the CBL or in Spamhaus ZEN, although it is in b.barracudacentral.org.
In a further interesting development, looking at our logs in more detail showed that there's actually a second run from the same IP an hour or so earlier, with a HELO of '163.com', a MAIL FROM of 'changlimachine101@163.com', and a Subject of 'Purchase Inquiry RG LLC'. This run was detected as the same two types of malware, but it has a different mix of attachment types:
attachment application/pdf; MIME file ext: .pdf attachment application/octet-stream; MIME file ext: .xlsx; zip exts: .bin[8] .png[2] .rels[10] .vml[3] .xml[21] none
This may mean that the first attachment is basically a cover letter and it's the second attachment where all the malware lurks.
Sidebar: More spammers covering their bases
In the past nine days or so, we've also seen:
attachment application/msword; MIME file ext: .doc; zip exts: .rels .xml[3] none attachment application/vnd.ms-excel; MIME file ext: .xls; zip exts: .rels .xml[3] none rejected [...] identified virus: CXmail/OleDl-AD, CXmail/OleDl-AQ
(with the Subject of 'Re: August PO #20180911000'.)
The idea of putting together two different OLE-based attacks in two different documents amuses me. It's kind of brute force, and also optimistic (since you're hoping that neither is recognized and thus blocks your email).
Then there's:
attachment application/msword; MIME file ext: .doc attachment application/pdf; MIME file ext: .pdf rejected [...] identified virus: CXmail/RTF-F, Troj/20170199-P
And then there's what is probably a case of 'let's throw two phish attempts into one email':
attachment text/html; MIME file ext: .html attachment text/html; MIME file ext: .html rejected [...] identified virus: Troj/Phish-CZV, Troj/Phish-DAG
As I discovered once we started logging attachment types, our commercial anti-spam system identifying something as having phish 'malware' probably means it's in the attachments. This one had a Subject of 'Details Attached'. I bet they were.
Some Firefox addons I'm experimenting with (as of Firefox 62 or so)
One of the interesting things that's happened as a consequence of my switch to Firefox Quantum (ie 57+) is that I've become much more willing to experiment with addons. My pre-Quantum Firefox setup seemed prone to memory leaks due to addons, which made me fairly nervous about adding more; resigned to leaks or not, I didn't really enjoy the experience. My Firefox Quantum setup seems to be clearly better on all aspects of this (both initial memory usage and growth over time), and this has made me more willing to try addons.
Technically I'm getting most of my exposure to these addons through the latest Firefox master tree ('mozilla-central'), which I compile from source every week or so. But I don't think they do anything different in Firefox 61 or 62, and I have set up some of them there.
Make Medium Readable Again (also, via) is basically what it says on the tin. I'm not affected by as much of the Medium plague as most because I disable most JavaScript and cookies through uMatrix, and I tried dealing with the remaining annoyances by blocking and re-blocking various HTML elements to try to eliminate their top bar and bottom bar, but eventually I got tired of it all. MMRA is a big hammer but it appears to be a reliable one so far, and it works across all of the very many sites and blogs and so on that use Medium, and it has some useful additional effects during the times when I have to turn on JavaScript in order to see important pictures or embedded Github gists of code or the like.
(Medium really is a plague and there is going to be a lot of carnage whenever it winds up shutting down, which I expect it to do within five or ten years at most. A lot of writing is going to disappear from the Internet and that bums me out.)
Certainly Something was pointed out to me on Twitter by @AleXgTorres. It's a quite thorough HTTPS connection information and certificate viewer. I don't use it very often but I care enough about TLS certificate stuff to keep it around in case (I have a history of having some such addon lying around), and it's not particularly obtrusive when I'm not using it. I could pick nits with the interface, but it's not that important in something that I only look at infrequently and CS's presentation of the certificate is traditional.
Finally I've recently added Link Cleaner (via the Mozilla blog entry) because I have to admit that I'm more than a bit tired of all of those utm_ fragments and other things. I sort of wish that it worked like my Google URL fixer addon and fixed the links in place, so that copying a link into some other program also gave me the de-utm'd version, but that's a minor thing. If I cared enough, well, LC's code is GPL3 and I could easily drop it into a version of my addon.
(The LC addon page is clear about how it works and there are probably benefits to cleaning the URL when it's actually used. Ultimately I don't care enough to go out of my way to deal with this; I barely care enough to use the addon when Mozilla basically shoved it under my nose.)
I've considered using 'Stylish' again (these days I'd use Stylus, since the actual 'Stylish' browser extension went bad), but I just don't seem to have much of a desire for re-styling websites these days. Most of what I want to do today is make annoying bits of websites go away entirely, and that's part of what I use uBlock Origin for. Possibly I could use some clever style override to deal with the header and footer plague, but my current answer is often to close the window instead.