Wandering Thoughts archives

2020-09-28

My likely path away from spinning hard drives on my home desktop

One of my goals for my home desktop is to move entirely to solid state storage. Well, it's a goal for both my home and work machine, and I originally expected to get there first at home, but then work had spare money and suddenly my work machine has been all solid state for some time (which is great except for the bit where I'm not at work to enjoy it).

Moving to all solid state at work was relatively straightforward because all of my old storage on my work machine was relatively small; I had a mirrored pair of 250 GB SSDs, a mirrored pair of 1 TB HDs, and a third 500 GB HD for less important things, and none of them were all too full. This was easily all replaced with a pair of reasonable sized NVMe drives and a pair of 2 TB SSDs, which weren't that expensive even in late 2019. Unfortunately my home machine is better configured; I currently have a mirrored pair of 750 GB SSDs and a mirrored pair of '3 TB' HDs (one of them is a 4 TB HD, but since it's mirrored the extra TB is wasted). The HDs are used for a LVM volume that has only about 1.4 TiB allocated, so in theory I could get away with a pair of 2 TB SSDs as the replacement for these HDs. However, that would leave me relatively short of extra space for things like digital photography (those RAW files add up fast).

The obvious replacement and supplement for my current 750 GB SSDs is a pair of decent 1 TB NVMe drives, which seem to be not too expensive these days. Unfortunately there is not as good a replacement for my pair of 3 TB HDs. While 4 TB SSDs are available, they cost noticeably more per GB than 2 TB SSDs do (as I write this, one large Canadian online retailer lists WD Blue 2 TB SSDs for $304 and the 4 TB version for $709). One option would be to shrug and pay the premium for future proofing things; another would be to buy a pair of 2 TB SSDs and rely on a combination of the extra space on the NVMe drives, reusing my current 750 GB SSDs, and rationalizing space usage when I migrate from my old LVM setup to ZFS on the new SSDs.

A complication is that now is not necessarily the right time to buy new NVMe drives, especially relatively expensive ones. The NVMe world is just starting to move from PCIe 3.0 to PCIe 4.0, which offers various improvements once everything is working. My current home motherboard has no PCIe 4.0 support, of course, but based on past experience I'll be keeping any NVMe drives that I buy now for at least half a decade, which means that they'll likely wind up in a PCIe 4.0 capable system within their lifetime.

(On the one hand, PCIe 4.0 will probably not make a particularly visible performance difference on my home machine on typical or even somewhat atypical tasks, like compiling Firefox from source. On the other hand, I don't like leaving potential performance on the table.)

So despite all of what I've written, I'm probably going to do my usual thing and sit on my hands for a while. Perhaps various end of the year sale prices will get me to finally move forward.

(This is one of the entries that I write partly to try to motivate myself.)

PS: I have a mixed pair of 3TB and 4TB HDs for the usual reason, which is that I used to have a pair of 3 TB HDs and then one of them died and I needed to replace it. My LVM array has migrated up from smaller sizes of HDs over time this way.

(Waiting for a warranty replacement is never an option, because I want my redundancy back much sooner than a replacement would get to me.)

tech/HomePCAllSolidStatePath written at 20:46:56; Add Comment

Making product names of what you use visible to people is generally a mistake

For years, we've used Sophos PureMessage as the major part of our overall spam filtering. I don't mention specific product names very often for various reasons, but it's now harmless because Sophos is dropping PureMessage (also). We were already planning to almost certainly replace PureMessage for reasons other than this, but Sophos's decision to move to a cloud based service model forces our hand.

(We actually have the replacement more or less planned out and will likely start switching away from PureMessage very soon.)

As part of our overall filtering (and as standard in a lot of environments), we've set it up so that messages that are considered sufficiently spammy have a tag at the start of the Subject: header. People can then do their own filtering (in procmail or these days in their IMAP mail client) based on that tag, and various other pieces of our mail system also change their behavior if a message's Subject: has been marked this way. The specific tag we use is thus a well known and fundamentally fixed part of our overall mail environment; changing it would require configuration changes across our systems and force people to change their own mail setups, to their annoyance.

The tag we chose, almost fifteen years ago, was (and is) '[PMX:SPAM]'. This was chosen because 'PMX' is the common abbreviation for 'PureMessage' (used in Sophos's documentation, among other places), and we thought that '[SPAM]' was a bit too generic and likely to be added to Subject: headers by other places before the messages got to us.

If things go as expected, in a few months we won't be using Sophos PureMessage any more, and 'PMX' will mean nothing. But I can confidently predict that in ten years, our mail system will still be tagging sufficiently spammy email with '[PMX:SPAM]' (if we still have a mail system at all, and we probably will).

This is not the first time I've made the mistake of burning product names (software or hardware) into things that are visible to people, and it probably won't be the last time, either. Doing this is even a famous sysadmin mistake for hostnames (many '<x>vax' hosts lived on for years past when they were in fact DEC VAXes). But still, hopefully I can learn something from this and maybe do better for the next time around.

PS: There are clever transition plans like adding a second, more generic tag and then deprecating the first one over the course of many years, but they're not worth it. The other lesson is that sometimes you just shrug and live with the odd name long after you're using a software product or a particular type of hardware. It can even become a part of local folklore.

sysadmin/VisibleProductNamesBad written at 17:02:48; Add Comment

Looking at DKIM information for our 'good' email (September 2020 edition)

I was recently asked (in another context) if DKIM (Domain Keys Identified Mail) was sufficiently common that one could start to require it on email, or at least use it as a strong part of spam scoring. Well, let's get some statistics from our mail system. This has a different focus than my October 2018 statistics, where I was looking at total overall stats; here I'm going to try to look only at email that seems good, based on being accepted by us with a sufficiently low rspamd score.

All of the following statistics are from the past ten days of full logs. Over that period we accepted 82,878 email messages at our external MX, and about 12,400 of these messages got high enough rspamd scores that I'm going to call them 'not good' email, leaving 70,473 messages in my sample set. Out of these, only 52,984 had any DKIM information at all. 25% of my sample set of probably good email has no DKIM information.

So the first answer is that we definitely cannot use the lack of DKIM signatures as a spam scoring signature with any real weight. A quarter of our presumed good incoming email would be marked down by such a check, which is very likely a bad false positive rate (if it's not, rspamd is doing a terrible job, passing more spam messages than it recognizes).

However, of those messages with DKIM signatures, almost all of them passed at least one DKIM signature check (remember that messages can have more than one DKIM signature), 50,754 out of the 52,984. Speaking of signatures, 9,875 of the messages had more than one, which is an appreciable quantity. 538 messages with multiple DKIM signatures had different results from the different signatures, as reported by Exim. I cannot readily generate numbers on how many messages with multiple DKIM signatures failed all of them, but there were 2,737 messages with one or more DKIM signature failures.

(Two messages had three different results. The more amusing one had three signatures for the same domain; one passed, one failed with a body hash mismatch, and one failed signature verification, with Exim suggesting that the headers had probably been modified. It looks like that the message came from a mailing list.)

Given that this amounts to 4% of messages with DKIM failing all checks, for presumed good email, I think that using 'has DKIM but nothing verifies' as a strong spam signal is probably a mistake. Manual inspection suggests that a lot of such email comes from mailing lists. Some of it certainly appears to come directly from the domain generating the DKIM signature, so probably some people have mail systems that are screwing things up, too.

(Some of these 'comes directly from the domain of the DKIM signature but it doesn't verify' are DKIM public key lookups, but some of them are signature verification failures. Apparently some places are generating DKIM signatures and then having their mail system mangle their messages.)

You might think that we could at least rely on email claiming to have our own DKIM signatures to either verify or be not legitimate. I'm afraid to tell you otherwise; there's an appreciable number of incoming messages that we generated, sent somewhere else, and that came back mangled so they failed DKIM checks. These are not all from mailing lists, either, which makes me sigh. We see this happening to email from other University of Toronto subdomains as well.

We saw a total of 522 DKIM failures for 'public key record unavailable', from 89 different DKIM 'd=' domain values, including a number of places that really should know better. It's quite possible that some of these are transient DNS failures. I haven't investigated this further because I don't care that much.

My overall conclusion is that for us, DKIM is not a useful spam scoring signal, even if we restrict our scope to scoring on DKIM results. Your results may vary depending on your mail patterns, and that's really the other conclusion; right now, you need to evaluate the DKIM results for your own specific incoming email patterns before making any decisions here.

spam/DKIMVersusGoodMail-2020-09 written at 01:19:52; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.