Wandering Thoughts

2018-08-26

A little bit of the one-time MacOS version still lingers in ZFS

Once upon a time, Apple came very close to releasing ZFS as part of MacOS. Apple did this work in its own copy of the ZFS source base (as far as I know), but the people in Sun knew about it and it turns out that even today there is one little lingering sign of this hoped-for and perhaps prepared-for ZFS port in the ZFS source code. Well, sort of, because it's not quite in code.

Lurking in the function that reads ZFS directories to turn (ZFS) directory entries into the filesystem independent format that the kernel wants is the following comment:

 objnum = ZFS_DIRENT_OBJ(zap.za_first_integer);
 /*
  * MacOS X can extract the object type here such as:
  * uint8_t type = ZFS_DIRENT_TYPE(zap.za_first_integer);
  */

(Specifically, this is in zfs_readdir in zfs_vnops.c .)

ZFS maintains file type information in directories. This information can't be used on Solaris (and thus Illumos), where the overall kernel doesn't have this in its filesystem independent directory entry format, but it could have been on MacOS ('Darwin'), because MacOS is among the Unixes that support d_type. The comment itself dates all the way back to this 2007 commit, which includes the change 'reserve bits in directory entry for file type', which created the whole setup for this.

I don't know if this file type support was added specifically to help out Apple's MacOS X port of ZFS, but it's certainly possible, and in 2007 it seems likely that this port was at least on the minds of ZFS developers. It's interesting but understandable that FreeBSD didn't seem to have influenced them in the same way, at least as far as comments in the source code go; this file type support is equally useful for FreeBSD, and the FreeBSD ZFS port dates to 2007 too (per this announcement).

Regardless of the exact reason that ZFS picked up maintaining file type information in directory entries, it's quite useful for people on both FreeBSD and Linux that it does so. File type information is useful for any number of things and ZFS filesystems can (and do) provide this information on those Unixes, which helps make ZFS feel like a truly first class filesystem, one that supports all of the expected general system features.

solaris/ZFSDTypeAndMacOS written at 21:24:29; Add Comment

How ZFS maintains file type information in directories

As an aside in yesterday's history of file type information being available in Unix directories, I mentioned that it was possible for a filesystem to support this even though its Unix didn't. By supporting it, I mean that the filesystem maintains this information in its on disk format for directories, even though the rest of the kernel will never ask for it. This is what ZFS does.

(One reason to do this in a filesystem is future-proofing it against a day when your Unix might decide to support this in general; another is if you ever might want the filesystem to be a first class filesystem in another Unix that does support this stuff. In ZFS's case, I suspect that the first motivation was larger than the second one.)

The easiest way to see that ZFS does this is to use zdb to dump a directory. I'm going to do this on an OmniOS machine, to make it more convincing, and it turns out that this has some interesting results. Since this is OmniOS, we don't have the convenience of just naming a directory in zdb, so let's find the root directory of a filesystem, starting from dnode 1 (as seen before).

# zdb -dddd fs3-corestaff-01/h/281 1
Dataset [....]
[...]
    microzap: 512 bytes, 4 entries
[...]
         ROOT = 3 

# zdb -dddd fs3-corestaff-01/h/281 3
    Object  lvl   iblk   dblk  dsize  lsize   %full  type
        3    1    16K     1K     8K     1K  100.00  ZFS directory
[...]
    microzap: 1024 bytes, 8 entries

         RESTORED = 4396504 (type: Directory)
         ckstst = 12017 (type: not specified)
         ckstst3 = 25069 (type: Directory)
         .demo-file = 5832188 (type: Regular File)
         .peergroup = 12590 (type: not specified)
         cks = 5 (type: not specified)
         cksimap1 = 5247832 (type: Directory)
         .diskuse = 12016 (type: not specified)
         ckstst2 = 12535 (type: not specified)

This is actually an old filesystem (it dates from Solaris 10 and has been transferred around with 'zfs send | zfs recv' since then), but various home directories for real and test users have been created in it over time (you can probably guess which one is the oldest one). Sufficiently old directories and files have no file type information, but more recent ones have this information, including .demo-file, which I made just now so this would have an entry that was a regular file with type information.

Once I dug into it, this turned out to be a change introduced (or activated) in ZFS filesystem version 2, which is described in 'zfs upgrade -v' as 'enhanced directory entries'. As an actual change in (Open)Solaris, it dates from mid 2007, although I'm not sure what Solaris release it made it into. The upshot is that if you made your ZFS filesystem any time in the last decade, you'll have this file type information in your directories.

How ZFS stores this file type information is interesting and clever, especially when it comes to backwards compatibility. I'll start by quoting the comment from zfs_znode.h:

/*
 * The directory entry has the type (currently unused on
 * Solaris) in the top 4 bits, and the object number in
 * the low 48 bits.  The "middle" 12 bits are unused.
 */

In yesterday's entry I said that Unix directory entries need to store at least the filename and the inode number of the file. What ZFS is doing here is reusing the 64 bit field used for the 'inode' (the ZFS dnode number) to also store the file type, because it knows that object numbers have only a limited range. This also makes old directory entries compatible, by making type 0 (all 4 bits 0) mean 'not specified'. Since old directory entries only stored the object number and the object number is 48 bits or less, the higher bits are guaranteed to be all zero.

(It seems common to define DT_UNKNOWN to be 0; both FreeBSD and Linux do it.)

The reason this needed a new ZFS filesystem version is now clear. If you tried to read directory entries with file type information on a version of ZFS that didn't know about them, the old version would likely see crazy (and non-existent) object numbers and nothing would work. In order to even read a 'file type in directory entries' filesystem, you need to know to only look at the low 48 bits of the object number field in directory entries.

(As before, I consider this a neat hack that cleverly uses some properties of ZFS and the filesystem to its advantage.)

solaris/ZFSAndDirectoryDType written at 00:43:13; Add Comment

2018-08-25

The history of file type information being available in Unix directories

The two things that Unix directory entries absolutely have to have are the name of the directory entry and its 'inode', by which we generically mean some stable kernel identifier for the file that will persist if it gets renamed, linked to other directories, and so on. Unsurprisingly, directory entries have had these since the days when you read the raw bytes of directories with read(), and for a long time that was all they had; if you wanted more than the name and the inode number, you had to stat() the file, not just read the directory. Then, well, I'll quote myself from an old entry on a find optimization:

[...], Unix filesystem developers realized that it was very common for programs reading directories to need to know a bit more about directory entries than just their names, especially their file types (find is the obvious case, but also consider things like 'ls -F'). Given that the type of an active inode never changes, it's possible to embed this information straight in the directory entry and then return this to user level, and that's what developers did; on some systems, readdir(3) will now return directory entries with an additional d_type field that has the directory entry's type.

On Twitter, I recently grumbled about Illumos not having this d_type field. The ensuing conversation wound up with me curious about exactly where d_type came from and how far back it went. The answer turns out to be a bit surprising due to there being two sides of d_type.

On the kernel side, d_type appears to have shown up in 4.4 BSD. The 4.4 BSD /usr/src/sys/dirent.h has a struct dirent that has a d_type field, but the field isn't documented in either the comments in the file or in the getdirentries(2) manpage; both of those admit only to the traditional BSD dirent fields. This 4.4 BSD d_type was carried through to things that inherited from 4.4 BSD (Lite), specifically FreeBSD, but it continued to be undocumented for at least a while.

(In FreeBSD, the most convenient history I can find is here, and the d_type field is present in sys/dirent.h as far back as FreeBSD 2.0, which seems to be as far as the repo goes for releases.)

Documentation for d_type appeared in the getdirentries(2) manpage in FreeBSD 2.2.0, where the manpage itself claims to have been updated on May 3rd 1995 (cf). In FreeBSD, this appears to have been part of merging 4.4 BSD 'Lite2', which seems to have been done in 1997. I stumbled over a repo of UCB BSD commit history, and in it the documentation appears in this May 3rd 1995 change, which at least has the same date. It appears that FreeBSD 2.2.0 was released some time in 1997, which is when this would have appeared in an official release.

In Linux, it seems that a dirent structure with a d_type member appeared only just before 2.4.0, which was released at the start of 2001. Linux took this long because the d_type field only appeared in the 64-bit 'large file support' version of the dirent structure, and so was only return by the new 64-bit getdents64() system call. This would have been a few years after FreeBSD officially documented d_type, and probably many years after it was actually available if you peeked at the structure definition.

(See here for an overview of where to get ancient Linux kernel history from.)

As far as I can tell, d_type is present on Linux, FreeBSD, OpenBSD, NetBSD, Dragonfly BSD, and Darwin (aka MacOS or OS X). It's not present on Solaris and thus Illumos. As far as other commercial Unixes go, you're on your own; all the links to manpages for things like AIX from my old entry on the remaining Unixes appear to have rotted away.

Sidebar: The filesystem also matters on modern Unixes

Even if your Unix supports d_type in directory entries, it doesn't mean that it's supported by the filesystem of any specific directory. As far as I know, every Unix with d_type support has support for it in their normal local filesystems, but it's not guaranteed to be in all filesystems, especially non-Unix ones like FAT32. Your code should always be prepared to deal with a file type of DT_UNKNOWN.

(Filesystems can implement support for file type information in directory entries in a number of different ways. The actual on disk format of directory entries is filesystem specific.)

It's also possible to have things the other way around, where you have a filesystem with support for file type information in directories that's on a Unix that doesn't support it. There are a number of plausible reasons for this to happen, but they're either obvious or beyond the scope of this entry.

unix/DirectoryDTypeHistory written at 00:31:39; Add Comment

2018-08-24

Incremental development in Python versus actual tests

After writing yesterday's entry on how I need to add tests to our Djanog app, I've been thinking about why it doesn't have them already. One way to describe the situation is that I didn't bother to write any tests when I wrote the app, but another view is that I didn't write tests because I didn't need to. So let me explain what I mean by that.

When I ran into the startup overhead of small Python programs, my eventual solution was to write a second implementation in Go, which was kind of an odd experience (as noted). One of the interesting differences between the two versions is that the Go version has a fair number of tests and the Python one doesn't have any. There are a number of reasons for this, but one of them is that in Go, tests are often how you interact with your code. I don't mean that philosophically; I mean that concretely.

In Python, if you've written some code and you want to try it out to see if it at least sort of works, you fire up the Python interpreter, do 'import whatever' (even for your main program), and start poking away. In Go, you have no REPL, so often the easiest way to poke at some new code is to open up your editor, write some minimal code that you officially call a 'test', and run 'go test' to invoke it (and everything else you have as a test). This is more work than running an interactive Python session and it's much slower to iterate on 'what happens if ...' questions about the code, but it has the quiet advantage that it's naturally persistent (since it's already in a file).

This is the sense in which I didn't need tests to write our Django app. As I was coding, I could use the Python REPL and then later all of Django's ready to go framework itself to see if my code worked. I didn't have to actually write tests in order to test my code, not in the way that you can really need to in Go. In Python, incremental development can easily be done with all of your 'tests' being ad hoc manual work that isn't captured or repeatable.

(Even in Go, my testing often trails off dramatically the moment I have enough code written that I can start running a command to exercise things. In the Go version of my Python program, basically all of the tests are for low-level things and I 'tested' to see if higher level things worked by running the program.)

PS: Django helps this incremental development along by making it easy to connect bits of your code up to a 'user interface' in the form of a web page. You need somewhat more than a function call in the REPL but not much more, and then you can use 'manage runserver ...' to give you a URL you can use to poke your code, both to see things rendered and to test form handling. And sometimes you can check various pieces of code out just through the Django admin interface.

PPS: Of course it's better to do incremental development by writing actual tests. But it takes longer, especially if you don't already know how to test things in the framework you're using, as I didn't when I was putting the app together (cf).

python/PythonREPLAndTests written at 01:11:54; Add Comment

2018-08-23

It's time for me to buckle down and add tests to our Django web app

Our Django web app is the Python 2 code that I'm most concerned about in a world where people are trying to get rid of Python 2, for two reasons. First, not only do we have to worry about Python 2 itself being available but also the Django people have been quite explicit that Django 1.11 is the last version that supports Python 2 and the Django people will stop supporting it in 2020. We probably don't want to be using an unsupported web framework. Second, it's probably the program that's most exposed to character set conversion issues, simply because that seems to be in the nature of things that deal with the web, databases, and so on. In short, we've got to convert it to Python 3 sometime, probably relatively soon, and it's likely to be more challenging than other conversions we've done.

One of the things that would make a Python 3 conversion less challenging is if we had (automated) tests for the app, ideally fairly comprehensive ones. Having solid tests for your code is best practices for a Python 3 conversion for good reasons, and they'd also probably help with things like our Django upgrade pains. Unfortunately we've never had them, which was something I regretted in 2014 and hasn't gotten any better since then, because there's never been a time when adding tests was either a high enough priority or something that I was at all enthused about doing.

(One of my reasons for not feeling enthusiastic is that I suspect that trying to test the current code would lead me to have to do significant revisions on it in order to make it decently testable.)

Looking at our situation, I've wound up feeling that it's time for this to change. Our path forward with the Django app should start with adding tests, which will make both Python 3 and future Django upgrades (including to 1.11 itself) less risky, less work, and less tedious (since right now I do all testing by hand).

(Hopefully adding tests will have other benefits for future development and so on, but some of these are contingent on additional factors beyond the scope of this entry.)

Unfortunately, adding tests to this code is likely to feel like make-work to me, and in a sense it is; the code already works (yes, as far as we know), so all that tests do is confirm that it does. I have no features to add, so I can't add tests to cover the new features as I add them; instead, this is going to have to be just grinding out tests for existing code. Still, I think it needs to be done, and the first step for doing it is for me to learn how to test Django code, starting by reading the documentation.

(This entry is one of the ones that I'm writing in large part as a marker in the ground for myself, to make it more likely that I'll actually carry through on things. This doesn't always work; I still haven't actually studied dpkg and apt, despite declaring I was going to two years ago and having various tabs open on documentation since then. I've even read bits of the documentation from time to time, and then all of the stuff I've read quietly falls out of my mind again. The useful bits of dpkg I've picked up since then have come despite this documentation, not because of it. Generally they come from me having some problem and stumbling over a way to fix it. Unfortunately, our problems with our Django app, while real, are also big and diffuse and not immediate, so it's easy to put them off.)

python/DjangoWeNeedTests written at 01:38:11; Add Comment

2018-08-22

Why ed(1) is not a good editor today

I'll start with my tweet:

Heretical Unix opinion time: ed(1) may be the 'standard Unix editor', but it is not a particularly good editor outside of a limited environment that almost never applies today.

There is a certain portion of Unixdom that really likes ed(1), the 'standard Unix editor'. Having actually used ed for a not insignificant amount of time (although it was the friendlier 'UofT ed' variant), I have some reactions to what I feel is sometimes overzealous praise of it. One of these is what I tweeted.

The fundamental limitation of ed is that it is what I call an indirect manipulation interface, in contrast to the explicit manipulation interfaces of screen editors like vi and graphical editors like sam (which are generally lumped together as 'visual' editors, so called because they actually show you the text you're editing). When you edit text in ed, you have some problems that you don't have in visual editors; you have to maintain in your head the context of what the text looks like (and where you are in it), you have to figure out how to address portions of that text in order to modify them, and finally you have to think about how your edit commands will change the context. Copious use of ed's p command can help with the first problem, but nothing really deals with the other two. In order to use ed, you basically have to simulate parts of ed in your head.

Ed is a great editor in situations where the editor explicitly presenting this context is a very expensive or outright impossible operation. Ed works great on real teletypes, for example, or over extremely slow links where you want to send and receive as little data as possible (and on real teletypes you have some amount of context in the form of an actual printout that you can look back at). Back in the old days of Unix, this described a fairly large number of situations; you had actual teletypes, you had slow dialup links (and later slow, high latency network links), and you had slow and heavily overloaded systems.

However, that's no longer the situation today (at least almost all of the time). Modern systems and links can easily support visual editors that continually show you the context of the text and generally let you more or less directly manipulate it (whether that is through cursoring around it or using a mouse). Such editors are easier and faster to use, and they leave you with more brainpower free to think about things like the program you're writing (which is the important thing).

If you can use a visual editor, ed is not a particularly good editor to use instead; you will probably spend a lot of effort (and some amount of time) on doing by hand something that the visual editor will do for you. If you are very practiced at ed, maybe this partly goes away, but I maintain that you are still working harder than you need to be.

The people who say that ed is a quite powerful editor are correct; ed is quite capable (although sadly limited by only editing a single file). It's just that it's also a pain to use.

(They're also correct that ed is the foundation of many other things in Unix, including sed and vi. But that doesn't mean that the best way to learn or understand those things is to learn and use ed.)

This doesn't make ed a useless, vestigial thing on modern Unix, though. There are uses for ed in non-interactive editing, for example. But on modern Unix, ed is a specialized tool, much like dc. It's worth knowing that ed is there and roughly what it can do, but it's probably not worth learning how to use it before you need it. And you're unlikely to ever be in a situation where it's the best choice for interactive editing (and if you are, something has generally gone wrong).

(But if you enjoy exploring the obscure corners of Unix, sure, go for it. Learn dc too, because it's interesting in its own way and, like ed, it's one of those classical old Unix programs.)

unix/EdNoLongerGoodEditor written at 01:17:53; Add Comment

2018-08-20

Explicit manipulation versus indirect manipulation UIs

One of the broad splits in user interfaces in general is the spectrum between what I'll call explicit manipulation and indirect manipulation. Put simply, in a explicit manipulation interface you see what you're working on and you specify it directly, and in an indirect manipulation interface you don't; you specify it indirectly. The archetypal explicit manipulation interface is the classical GUI mouse-based text selection and operations on it; you directly select the text with the mouse cursor and you can directly see your selection.

(This directness starts to slip away once your selection is large enough that you can no longer see it all on the screen at once.)

An example of an indirect manipulation interface is the common interactive Unix shell feature of !-n, for repeating (or getting access to) the Nth previous command line. You aren't directly pointing to the command line and you may not even still have it visible on the screen; instead you're using it indirectly, through knowledge of what relative command number it is.

A common advantage of indirect manipulation is that indirect manipulation is compact and powerful, and often fast; you can do a lot very concisely with indirect manipulation. Typing '!-7 CR' is unquestionably a lot faster than scrolling back up through a bunch of output to select and then copy/paste a command line. Even the intermediate version of hitting cursor up a few times until the desired command appears and then CR is faster than the full scale GUI text selection.

(Unix shell command line editing features span the spectrum of strong indirect manipulation through strong explicit manipulation; there's the !-n notation, cursor up/down, interactive search, and once you have a command line you can edit it in basically an explicit manipulation interface where you move the cursor around in the line to delete or retype or alter various bits.)

Indirect manipulation also scales and automates well; it's generally clear how to logically extend it to some sort of bulk operation that doesn't require any particular interaction. You specify what you want to operate on and what you want to do, and there you go. Abstraction more or less requires the use of indirect manipulation at some level.

The downside of indirect manipulation is that it requires you to maintain context in order to use it, in contrast to explicit manipulation where it's visible right in front of you. You can't type '!-7' without the context that the command you want is that one, not -6 or -8 or some other number. You need to construct and maintain this context in order to really use indirect manipulation effectively, and if you get the context wrong, bad things happen. I have accidentally shut down a system by being confidently wrong about what shell command line a cursor-up would retrieve, for example, and mistakes about context are a frequent source of production accidents like 'oops we just mangled the live database, not the test one' (or 'oops we modified much more of the database than we thought this operation would apply to').

My guess is that in much the same way that custom interfaces can be a benefit for people who use them a lot, indirect manipulation interfaces work best for frequent and ongoing users, because these are the people who will have the most experience at maintaining the necessary context in their head. Conveniently, these are the people who can often gain the most from using the compact, rapid power of indirect manipulation, simply because they spend so much time doing things with the system. By corollary, people who only infrequently use a thing are not necessarily going to remember context or be good at constructing it in their head and keeping track of it as they work (see also).

(The really great trick is to figure out some way to provide the power and compactness of indirect manipulation along with the low need for context of explicit manipulation. This is generally not easy to pull off, but in my view incremental search shows one path toward it.)

PS: I'm using 'user interface' very broadly here, in a sense that goes well beyond graphical UIs. Unix shells have a UI, programs have a UI in their command line arguments, sed and awk have a UI in the form of their little languages, programming languages and APIs have and are UIs, and so on. If people use it, it's in some sense a user interface.

(I'd like to use the term 'direct manipulation' for what I'm calling 'explicit manipulation' here, but the term has an established, narrower definition. GUI direct manipulation interfaces are a subset of what I'm calling explicit manipulation interfaces.)

tech/ExplicitVsIndirectManipulation written at 22:12:16; Add Comment

It's worth testing that obvious things actually do work

We've reached the point in putting together our future ZFS on Linux NFS fileservers where we believe we have everything built and now we're testing it to make sure that it works and to do our best to verify that there are no hidden surprises. In addition to the expected barrage of NFS client load tests and so on, my co-worker decided to verify that NFS locks worked. I would not have bothered, because of course NFS locks work, they are a well solved problem, and it has been many years since NFS locks (on Linux or elsewhere) had any chance of not working. This goes to show that my co-worker is smarter than I am, because when he actually tried it (using a little lock testing program that I wrote years ago), well:

$ ./locktest plopfile
Press <RETURN> to try to get a flock shared lock on plopfile:
Trying to get lock...
  flock lock failure: No locks available

With some digging we were able to determine that this was caused by rpc.statd not being started on our (Linux) fileserver. We're using NFS v3, which requires some extra daemons to handle aspects of the (separate) locking protocol, and presumably NFSv3 is unfashionable enough these days that systems no longer bother to start them by default.

(Perhaps I'm making excuses for Ubuntu 18.04 here.)

Had we taken the fileserver into production without discovering this, the good news is that important things like our mail system would probably have failed safe by refusing the proceed without locks. But we would certainly have had a fun debugging experience, and under more stress than we did in testing. So I'm very glad that my co-worker was carefully thorough here.

The obvious moral I take from this is that it's worth testing that the obvious things do work. The obvious things are probably not broken in general (otherwise you would hopefully have heard about it during system research and design), but there's always the possibility of setup or configuration mistakes, or that you have a sufficiently odd system that you're falling into a corner case. You may not want to test truly everything, but it's certainly worth testing important but obvious things, such as NFS locking.

(There's also the unpleasant possibility that you've wound up with some fundamental misunderstanding about how the system is designed to work. This is going to force some big changes, but it's better to find this out before you try to take your mistake into production, rather than afterward as things are exploding.)

How much and how thoroughly you test in general depends on your resources and the importance of what you're doing. Some places might find and run a test suite that verified that their new NFS fileservers were delivering full POSIX compatibility (or as much as you can on NFS in general), for example. Making a point of testing the obvious is only an issue if you're only going to do partial tests, and so you might otherwise be tempted to skip the 'it's so obvious it must work' bits in the interests of time.

You may also want to skip explicitly testing the obvious in favour of doing end to end tests that will depend on the obvious working. For example, we might set up an end to end test of mail delivery and (IMAP) mail reading, and if we had, that would almost certainly have discovered the locking issue. There are trade-offs involved in each level of testing, of course.

(The short version is that end to end testing can tell you that it works but it can't tell you why, and it can be dangerous to infer that why yourself. If you actually want a low level functionality test, do the test directly.)

Sidebar: The smoking gun symptom

The fileserver's kernel logs had a bunch of messages reporting:

lockd: cannot monitor <host>

This comes from kernel code that attempts to make an upcall to rpc.statd, which led us to look at ps to make sure that rpc.statd was there before we went digging further.

sysadmin/TestTheObvious written at 01:17:56; Add Comment

2018-08-19

Why I'm mostly not interest in exploring new fonts (on Unix)

Every so often, an enthusiasm for some new font or set of fonts goes around the corners of the Internet that I pay attention to. It's especially common for monospaced fonts (programmers seem to have a lot of opinions here), but people get enthused by proportional ones as well. I used to be reasonably interested in this area and customized my fonts and sometimes my font rendering, but these days I find that I generally take at most a cursory look at new fonts and tune out.

The first reason I tune out is limited Unicode glyph coverage. As I found out when I dug into it, xterm doesn't require your primary font to have CJK glyphs, but as far as I know it does require your font to have most anything else, including Cyrillic, Arabic, and Hebrew glyphs, if you're not going to see just 'glyph missing' boxes. I don't necessarily look at stuff in these languages in my xterms all that often, but it comes up every so often and I'd like to have things available in any new font, since I pretty much do currently.

(I have to admit that the most common place these character sets come up is when I'm looking at spam messages, email and otherwise.)

For proportional fonts, things are less clear to me. Firefox apparently still uses glyphs from multiple fonts as necessary (based on eg this bug), even with modern XFT/FreeType fonts. I believe that Gnome and probably KDE have some sort of system for this, which sometimes matters for programs like Liferea (syndication feeds) and Corebird (Twitter), but I have no idea how it all works or is controlled. If all of this works, it means that my nominal main proportional font wouldn't need complete glyph coverage, although I might have to do a bunch of configuration tweaking.

(Back in the pre-XFT days, Firefox used to have a quite complex and arcane system for finding glyphs to render, since X fonts often had very small coverage. If things went wrong this could produce rather bizarre results; Firefox was perfectly capable of switching fonts around in the middle of lines or words, even in nominally European text, if it decided that for some reason your primary font didn't have the necessary character and it had to fish it out from some other, very different font. All of this is fortunately behind us and I've mercifully forgotten most of the details.)

All of this matters because most alternate fonts don't have a very wide glyph coverage. This isn't surprising; drawing a lot of glyphs is a lot of work, and creating glyphs for non-Latin characters requires its own set of type design expertise in how those characters are supposed to look. In practice most artisanal new fonts are likely to be pretty much confined to Latin and near-Latin characters.

The bigger reason that I tune out these days is that standard Unix XFT fonts and font display are now pretty good. Back in the old pre-XFT days, X fonts (bitmapped or TrueType) were generally pretty limited and often not of particularly high quality, and rendering decisions could be questionable or oriented at hardware that you didn't have. This could make it very worthwhile to seek out an alternate font, because it could significantly improve your reading experience. I used an alternate bitmapped font in Firefox for years, and even after that stopped being a good idea I switched to an alternate TrueType font (one or the other is visible in this view of my desktop, although I no longer remember if it was the bitmapped 'Garamond' font or TTF Georgia).

The modern XFT world is much different. The current fonts that are mostly used and shipped by major desktop environments and Linux distributions are pretty well designed and have increasingly good glyph coverage (primarily this is DejaVu), and if you don't like them, there are alternates (eg Google's Noto). I've also come around to finding that the rendering decisions are generally good, which is a change from the past for me.

Given that the default fonts are already pretty good, the incremental improvement I might gain through fiddling around with my fonts doesn't appear to be worth the effort and the uncertainty. I could spend a lot of time tinkering, trying out various fonts, and basically wind up with something that was subtly worse because, to be honest, I don't know what I'm actually doing here (and I know enough to know that type design and typography is complex). So the easy path is just to use the defaults.

(This entry is sparked by reading vermaden's FreeBSD Desktop – Part 15 – Configuration – Fonts & Frameworks (via), and thinking about why I had no interest in doing something similar.)

Sidebar: Why Unix XFT fonts and font rendering got good

My impression is that part of it is simply the progress of time creating a slow but steady improvement, but a lot of it is that Unix/Linux font rendering became important enough that people spent money on it. Google obviously spent the money to develop Noto, but they're far from the only people. Sometimes this funding came explicitly, by commissioning font work and so on, and sometimes it came passively, where companies like Red Hat and Canonical hired people and had them spend (work) time on Unix font rendering.

(Android probably helped motivate Google and other parties here.)

unix/MyFontDisinterest written at 01:09:47; Add Comment

2018-08-17

Some malware apparently believes in covering its bases

Today our system for logging email attachment type information caught something interesting. Here's the important log messages:

<MSGID> attachment application/rtf; MIME file ext: .rtf
<MSGID> attachment application/zip; MIME file ext: .zip; zip exts: .pdf .rtf .xlsx
rejected <MSGID> from 185.185.25.104/luizhenrique@vencetudo.ind.br to <redacted>: identified virus: CXmail/Rtf-E, Exp/20180802-B

Exp/20180802-B is apparently an OLE2 based exploit using CVE-2017-11882, which appears to often be RTF-based (cf). This opens up the interesting and amusing possibility that both attachments are RTF based attacks (with the .pdf and .xlsx included in the .zip as either cover or supporting elements), and perhaps that they're the same RTF file. At the very least, this malware seems to believe in covering its bases; maybe you'll open a direct RTF attachment, or maybe you'll unzip the ZIP archive and use something in that.

We actually got several copies of this to various different local addresses, all apparently coming directly from this IP address (ie with no additional Received: headers) and all with the same 'Subject: Payment Advice'. The IP address in question isn't currently in the CBL or in Spamhaus ZEN, although it is in b.barracudacentral.org.

In a further interesting development, looking at our logs in more detail showed that there's actually a second run from the same IP an hour or so earlier, with a HELO of '163.com', a MAIL FROM of 'changlimachine101@163.com', and a Subject of 'Purchase Inquiry RG LLC'. This run was detected as the same two types of malware, but it has a different mix of attachment types:

attachment application/pdf; MIME file ext: .pdf
attachment application/octet-stream; MIME file ext: .xlsx; zip exts: .bin[8] .png[2] .rels[10] .vml[3] .xml[21] none

This may mean that the first attachment is basically a cover letter and it's the second attachment where all the malware lurks.

Sidebar: More spammers covering their bases

In the past nine days or so, we've also seen:

attachment application/msword; MIME file ext: .doc; zip exts: .rels .xml[3] none
attachment application/vnd.ms-excel; MIME file ext: .xls; zip exts: .rels .xml[3] none
rejected [...] identified virus: CXmail/OleDl-AD, CXmail/OleDl-AQ

(with the Subject of 'Re: August PO #20180911000'.)

The idea of putting together two different OLE-based attacks in two different documents amuses me. It's kind of brute force, and also optimistic (since you're hoping that neither is recognized and thus blocks your email).

Then there's:

attachment application/msword; MIME file ext: .doc
attachment application/pdf; MIME file ext: .pdf
rejected [...] identified virus: CXmail/RTF-F, Troj/20170199-P

And then there's what is probably a case of 'let's throw two phish attempts into one email':

attachment text/html; MIME file ext: .html
attachment text/html; MIME file ext: .html
rejected [...] identified virus: Troj/Phish-CZV, Troj/Phish-DAG

As I discovered once we started logging attachment types, our commercial anti-spam system identifying something as having phish 'malware' probably means it's in the attachments. This one had a Subject of 'Details Attached'. I bet they were.

spam/MalwareCoveringTheBases written at 15:21:37; Add Comment

Some Firefox addons I'm experimenting with (as of Firefox 62 or so)

One of the interesting things that's happened as a consequence of my switch to Firefox Quantum (ie 57+) is that I've become much more willing to experiment with addons. My pre-Quantum Firefox setup seemed prone to memory leaks due to addons, which made me fairly nervous about adding more; resigned to leaks or not, I didn't really enjoy the experience. My Firefox Quantum setup seems to be clearly better on all aspects of this (both initial memory usage and growth over time), and this has made me more willing to try addons.

Technically I'm getting most of my exposure to these addons through the latest Firefox master tree ('mozilla-central'), which I compile from source every week or so. But I don't think they do anything different in Firefox 61 or 62, and I have set up some of them there.

Make Medium Readable Again (also, via) is basically what it says on the tin. I'm not affected by as much of the Medium plague as most because I disable most JavaScript and cookies through uMatrix, and I tried dealing with the remaining annoyances by blocking and re-blocking various HTML elements to try to eliminate their top bar and bottom bar, but eventually I got tired of it all. MMRA is a big hammer but it appears to be a reliable one so far, and it works across all of the very many sites and blogs and so on that use Medium, and it has some useful additional effects during the times when I have to turn on JavaScript in order to see important pictures or embedded Github gists of code or the like.

(Medium really is a plague and there is going to be a lot of carnage whenever it winds up shutting down, which I expect it to do within five or ten years at most. A lot of writing is going to disappear from the Internet and that bums me out.)

Certainly Something was pointed out to me on Twitter by @AleXgTorres. It's a quite thorough HTTPS connection information and certificate viewer. I don't use it very often but I care enough about TLS certificate stuff to keep it around in case (I have a history of having some such addon lying around), and it's not particularly obtrusive when I'm not using it. I could pick nits with the interface, but it's not that important in something that I only look at infrequently and CS's presentation of the certificate is traditional.

Finally I've recently added Link Cleaner (via the Mozilla blog entry) because I have to admit that I'm more than a bit tired of all of those utm_ fragments and other things. I sort of wish that it worked like my Google URL fixer addon and fixed the links in place, so that copying a link into some other program also gave me the de-utm'd version, but that's a minor thing. If I cared enough, well, LC's code is GPL3 and I could easily drop it into a version of my addon.

(The LC addon page is clear about how it works and there are probably benefits to cleaning the URL when it's actually used. Ultimately I don't care enough to go out of my way to deal with this; I barely care enough to use the addon when Mozilla basically shoved it under my nose.)

I've considered using 'Stylish' again (these days I'd use Stylus, since the actual 'Stylish' browser extension went bad), but I just don't seem to have much of a desire for re-styling websites these days. Most of what I want to do today is make annoying bits of websites go away entirely, and that's part of what I use uBlock Origin for. Possibly I could use some clever style override to deal with the header and footer plague, but my current answer is often to close the window instead.

web/Firefox61MoreAddons written at 01:16:38; Add Comment

(Previous 11 or go back to August 2018 at 2018/08/16)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.