spam/DMARCRejectLocalEffects written at 23:11:23; Add Comment
How Yahoo's and AOL's DMARC 'reject' policies affect us
My whole interest in understanding DMARC started with the simple question of how Yahoo's and AOL's change to a DMARC 'reject' policy would affect us and our users, and how much of an effect it would have. The answer turns out to be that it will have some effects but nothing major.
The most important thing is that this change doesn't significantly affect either our users forwarding their email to places that pay attention to DMARC or our simple mailing lists because neither of them normally modify email on the way through (which means the DKIM signatures stay intact, which means that email really from Yahoo or AOL will still pass DMARC at the eventual destination). Of course it's possible that some people are forwarding email in ways that modify the message and thus may have problems, but if so they're doing something out of the ordinary; our simple mail forwarding doesn't do this.
(We allow users to run programs from their
There is one exception to this. Email that our commercial anti-spam
system detects as being either spam or a virus has its
People who forward their email to DMARC-respecting places will be affected in one additional way. The simple way to put it is that our forwarding is now imperfect, in that we'll accept some legitimate messages but can't forward them successfully. These would be emails from legitimate Yahoo or AOL users that were either sent from outside those places or that got modified in transit by, eg, mailing lists. A user who forwards their email to GMail is now losing these emails more or less silently (to the user). In extreme cases it's possible that they'll get unsubscribed from a mailing list due to these bounces.
This also affects any local user who was sending email out through
our local mail gateway using their AOL or Yahoo
(I'd like to monitor the amount of forwarding rejections but i can't think of a good way to dig the information out of our Exim logs, since mailing lists generally change the envelope sender address. This makes it tempting to have our inbound SMTP gateway do DMARC checks purely so I can see how many incoming messages fail them.)
PS: writing this entry has been a useful exercise in thinking through the full implications of our setup, as I initially forgot that our anti-spam filtering would invalidate DKIM signatures under some circumstances.
spam/UnderstandingDMARC written at 01:12:32; Add Comment
At least partially understanding DMARC
DMARC is suddenly on my mind because of the
news that AOL changed its DMARC policy to 'reject',
following the lead of Yahoo which did this a couple of weeks ago.
The short version is that a DMARC 'reject' policy is what I
originally thought DKIM was doing: it locks
down email with a
(I think that DMARC can also be used to say 'yes, really, pay attention to my strict SPF settings' if you're sufficiently crazy to break all email forwarding.)
This directly affects anyone who wants to send email with a
(Of course this power grab is not the primary goal of the exercise;
the primary goal is to cut off all of the spammers and other bad
actors that are attaching Yahoo and AOL
This indirectly affects anyone who has, for example, a mailing list
(or a mail forwarding setup) that modifies the message
DMARC by itself does not break simple mail relaying and forwarding (including for simple mailing lists), ie all things where the message and its headers are unmodified. An unmodified message's DKIM signature is still valid even if it doesn't come directly from Yahoo or AOL (or whoever) so everything is good as far as DMARC is concerned (assuming SPF sanity).
Note that Yahoo and AOL are not the only people with a DMARC 'reject'
policy. Twitter has one, for example. You can check a domain's DMARC
policy (if any) by looking at the
PS: I suspect that more big free email providers are going to move to publishing DMARC 'reject' policies, assuming that things don't blow up spectacularly for Yahoo and AOL. Which I doubt they will.
programming/NewLanguageLongevity written at 23:45:05; Add Comment
The question of language longevity for new languages
Every so often I feel a temptation to rewrite DWiki (the engine behind this blog) in Go. While there are all sorts of reasons not to (so many that it's at best a passing whimsy), one concern that immediately surfaces is the question of Go's likely longevity. I'd like the blog to still be here in, say, ten years, and if the engine is written in Go that needs Go to be a viable language in ten years (and on whatever platform I want to host the blog on).
Of course this isn't just a concern for Go; it's a concern for any new language and there's a number of aspects to it. To start with there's the issue of the base language. There are lots of languages that have come and gone, or come and not really caught on very much so that they're still around but not really more than a relatively niche language (even though people often love them very much and are very passionate about them). Even when a language is still reasonably popular there's the question of whether it's popular enough to be well supported on anything besides the leading few OS platforms.
(Of course the leading few OS platforms are exactly the ones that I'm most likely to be using. But that's not always the case; this blog is currently hosted on FreeBSD, for example, not Linux, and until recently it was on a relatively old FreeBSD.)
But you'd really like more than just the base language to still be around, because these days the base language is an increasingly small part of the big picture of packages and libraries and modules that you can use. We also want a healthy ecology of addons for the language, so that if you need support for, say, a new protocol or a new database binding or whatever you probably don't have to write it yourself. The less you have to do to evolve your program the more likely it is to evolve.
Finally there's a personal question: will the language catch on with you so that you'll still be working with it in ten years? Speaking from my own experience I can say that it's no fun to be stuck with a program in a language that you've basically walked away from, even if the language and its ecology is perfectly healthy.
Of course, all of this is much easier if you're writing things that you know will be superseded and replaced before they get anywhere near ten years old. Alternately you could be writing an implementation of a standard so that you could easily swap it out for something written in another language. In this sense a dynamically rendered blog with a custom wikitext dialect is kind of a worst case.
(For Go specifically I think it's pretty likely to be around and fully viable in ten years, although I have less of a sense of my own interest in programming in it. Of course ten years can be long time in computing and some other language could take over from it. I suspect that Rust would like to, for example.)
programming/SplittingLogging written at 02:39:55; Add Comment
Thinking about how to split logging up in multiple categories et al
I've used programs that do logging (both well and badly) and I've also written programs that did logging (also both reasonably well and badly) and the whole experience has given me some views on how I like logging split up to make it more controllable.
It's tempting to say that controlling logging is only for exceptional cases, like debugging programs. This is not quite true. Certainly this is the dominant case, but there are times when people have different interests about what to log even in routine circumstances. For example, on this blog I log detailed information about conditional GETs for syndication feeds because I like tracking down why (or why not) feed fetchers succeed at this. However this information isn't necessarily of interest to someone else running a DWiki instance so it shouldn't be part of the always-on mandatory logging; you should be able to control it.
The basic breakdown of full featured logging in a large system is to give all messages both a category and a level. The category is generally going to be the subsystem that they involve, while the level is the general sort of information that they have (informational, warnings, progress information, debugging details, whatever). You should be able to control the two together and separately, to say that you want only progress reports from all systems or almost everything from only one system and all the way through.
My personal view is that this breakdown is not quite sufficient by itself and there are a bunch of cases where you'll also want a verbosity level. Even if verbosity could in theory be represented by adding more categories and levels, in practice it's much easier for people to crank up the verbosity (or crank it down) rather than try to do more complex manipulations of categories and levels. As part of making life easier on people, I'd also have a single option that means 'turn on all logging options and log all debugging information (and possibly everything)'; this gives people a simple big stick to hit a problem with when they're desperate.
If your language and programming environment doesn't already have a log system that makes at least the category plus level breakdown easy to do, I wouldn't worry about this for relatively small programs. It's only fairly large and complex programs with a lot of subsystems where you start to really want this sort of control.
Sidebar: the two purposes of specific control
There are two reasons to offer people specific control over logging. The first is what I mentioned: sometimes not all information is interesting to a particular setup. I may want information on syndication feed conditional GETs while you may want 'time taken' information for all requests. Control over logging allows the program to support both of us (and the person who doesn't care about either) without cluttering up logs with stuff that we don't want. This is log control for routine logs, stuff that you're going to use during normal program operation.
The second reason is that a big system can produce too much information at full logging flow when you're trying to troubleshoot it, so much that useful problem indicators are lost in the overall noise. Here categorization and levels are a way of cutting down on the log volume so that people can see the important things. This is log control for debugging messages.
(There is an overlap between these two categories. You might log all SQL queries that a system does and the time they take for routine metrics, even though this was originally designed for debugging purposes.)
tech/PasswordMemorabilityHeresy written at 02:11:26; Add Comment
A heresy about memorable passwords
In the wake of Heartbleed, we've been writing some password guidelines at work. A large part of the discussion in them is about how to create memorable passwords. In the process of all of this, I realized that I have a heresy about memorable passwords. I'll put this way:
I will tell you a secret: I don't know what my Unix passwords are. Oh, I can type them and I do so often, but I don't know exactly what they are any more. If for some reason I had to recover what one of them was in order to write it out, the fastest way to do so would be to sit down in front of a computer and type it in. Give me just a pen and paper and I'm not sure I could actually do it. My fingers and reflexes know them far better better than my conscious mind.
If you pick a new password based purely at random with absolutely no scheme involved, you'll probably have to write it down on a piece of paper and keep referring to that piece of paper for a while, perhaps a week or so. After the week I'm pretty confidant that you'll be able to shred the piece of paper without any risk at all, except perhaps if you go on vacation for a month and have it fall out of your mind. Even then I wouldn't be surprised if you could type it by reflex when you come back. The truth is that people are very good at pushing repetitive things down into reflex actions, things that we do automatically without much conscious thought. My guess is that short, simple things can remain in conscious memory (this is at least my experience with some things I deal with); longer and more complex things, like a ten character password that involves your hands flying all over the keyboard, those go down into reflexes.
Thus, where memorable passwords really matter is not passwords you use frequently but passwords you use infrequently (and which you're not so worried about that you've seared into your mind anyways).
(Of course, in the real world people may not type their important passwords very often. I try not to think about that very often.)
PS: This neglects threat models entirely, which is a giant morass. But for what it's worth I think we still need to worry about password guessing attacks and so reasonably complex passwords are worth it.
unix/NFSLockingCanBeSlow written at 00:37:21; Add Comment
Cross-system NFS locking and unlocking is not necessarily fast
If you're faced with a problem of coordinating reads and writes on an NFS filesystem between several machines, you may be tempted to use NFS locking to communicate between process A (on machine 1) and process B (on machine 2). The attraction of this is that all they have to do is contend for a write lock on a particular file; you don't have to write network communication code and then configure A and B to find each other.
The good news is that this works, in that cross system NFS locking and unlocking actually works right (at least most of the time). The bad news is that this doesn't necessarily work fast. In practice, it can take a fairly significant amount of time for process B on machine 2 to find out that process A on machine 1 has unlocked the coordination file, time that can be measured in tens of seconds. In short, NFS locking works but it can require patience and this makes it not necessarily the best option in cases like this.
(The corollary of this is that when you're testing this part of NFS locking to see if it actually works you need to wait for quite a while before declaring things a failure. Based on my experiences I'd wait at least a minute before declaring an NFS lock to be 'stuck'. Implications for impatient programs with lock timeouts are left as an exercise for the reader.)
I don't know if acquiring an NFS lock on a file after a delay normally causes your machine's kernel to flush cached information about the file. In an ideal world it would, but NFS implementations are often not ideal worlds and the NFS locking protocol is a sidecar thing that's not necessarily closely integrated with the NFS client. Certainly I wouldn't count on NFS locking to flush cached information on, say, the directory that the locked file is in.
In short: you want to test this stuff if you need it.
PS: Possibly this is obvious but when I started testing NFS locking to make sure it worked in our environment I was a little bit surprised by how slow it could be in cross-client cases.
tech/ModernFSAndVolumeManagement written at 02:17:29; Add Comment
What modern filesystems need from volume management
One of the things said about modern filesystems like btrfs and ZFS is that their volume management functionality is a layering violation; this view holds that filesystems should stick to filesystem stuff and volume managers should stick to that. For the moment let's not open that can of worms and just talk about what (theoretical) modern filesystems need from an underlying volume management layer.
Arguably the crucial defining aspect of modern filesystems like ZFS and btrfs is a focus on resilience against disk problems. A modern filesystem no longer trusts disks not to have silent errors; instead it checksums everything so that it can at least detect data faults and it often tries to create some internal resilience by duplicating metadata or at least spreading it around (copy on write is also common, partly because it gives resilience a boost).
In order to make checksums useful for healing data instead of just simply detecting when it's been corrupted, a modern filesystem needs an additional operation from any underlying volume management layer. Since the filesystem can actually identify the correct block from a number of copies, it needs to be able to get all copies or variations of a set of data blocks from the underlying volume manager (and then be able to tell the volume manager which is the correct copy). In mirroring this is straightforward; in RAID 5 and RAID 6 it gets a little more complex. This 'all variants' operation will be used both during regular reads if a corrupt block is detected and during a full verification check where the filesystem will deliberately read every copy to check that they're all intact.
(I'm not sure what the right primitive operation here should be for RAID 5 and RAID 6. On RAID 5 you basically need the ability to try all possible reconstructions of a stripe in order to see which one generates the correct block checksum. Things get even more convoluted if the filesystem level block that you're checksumming spans multiple stripes.)
Modern filesystems generally also want some way of saying 'put A and B on different devices or redundancy clusters' in situations where they're dealing with stripes of things. This enables them to create multiple copies of (important) metadata on different devices for even more protection against read errors. This is not as crucial if the volume manager is already providing redundancy.
This level of volume manager support is a minimum level, as it still leaves a modern filesystem with the RAID-5+ rewrite hole and a potentially inefficient resynchronization process. But it gets you the really important stuff, namely redundancy that will actually help you against disk corruption.
unix/NFSWritePlusReadProblemII written at 00:10:33; Add Comment
Partly getting around NFS's concurrent write problem
In a comment on my entry about NFS's problem with concurrent writes, a commentator asked this very good question:
(Let's assume that we've got 'close to open' consistency to start with, where A fully writes the file before B processes it.)
If I was faced with this problem and I had a free hand with A and
B, I would make A create the file with some non-repeating name and
then send an explicit message to B with 'look at file <X>' (using eg
a TCP connection between the two). A should probably
Note that it's important to not reuse filenames. If A ever reuses a filename, B's kernel may have stale information about the old version of the file cached; at the best this will get B a stale filehandle error and at the worst B will read old information from the old version of the file.
If you can't communicate between A and B directly and B operates by scanning the directory to look for new files, you have a moderate caching problem. B's kernel will normally cache information about the contents of the directory for a while and this caching can delay B noticing that there is a new file in the directory. Your only option is to force B's kernel to cache as little as possible. Note that if B is scanning it will presumably only be scanning, say, once a second and so there's always going to be at least a little processing lag (and this processing lag would happen even if A and B were on the same machine); if you really want immediately, you need A to explicitly poke B in some way no matter what.
(I don't think it matters what A's kernel caches about the directory, unless there's communication that runs the other way such as B removing files when it's done with them and A needing to know about this.)
linux/BtrfsCoreMistake written at 01:27:57; Add Comment
Where I feel that btrfs went wrong
I recently finished reading this LWN series on btrfs, which was the most in-depth exposure at the details of using btrfs that I've had so far. While I'm sure that LWN intended the series to make people enthused about btrfs, I came away with a rather different reaction; I've wound up feeling that btrfs has made a significant misstep along its way that's resulted in a number of design mistakes. To explain why I feel this way I need to contrast it with ZFS.
Btrfs and ZFS are each both volume managers and filesystems merged together. One of the fundamental interface differences between them is that ZFS has decided that it is a volume manager first and a filesystem second, while btrfs has decided that it is a filesystem first and a volume manager second. This is what I see as btrfs's core mistake.
(Overall I've been left with the strong impression that btrfs basically considers volume management to be icky and tries to have as little to do with it as possible. If correct, this is a terrible mistake.)
Since it's a volume manager first, ZFS places volume management front
and center in operation. Before you do anything ZFS-related, you need
to create a ZFS volume (which ZFS calls a pool); only once this is done
do you really start dealing with ZFS filesystems. ZFS even puts the two
jobs in two different commands (
Because btrfs puts the filesystem first it wedges volume creation in
as a side effect of filesystem creation, not a separate activity,
and then it carries a series of lies and uselessly physical details
through to filesystem level operations like
I think that this also leads to the asinine design decision that subvolumes have magic flat numeric IDs instead of useful names. Something that's willing to admit it's a volume manager, such as LVM or ZFS, has a name for the volume and can then hang sub-names off that name in a sensible way, even if where those sub-objects appear in the filesystem hierarchy (and under what names) gets shuffled around. But btrfs has no name for the volume to start with and there you go (the filesystem-volume has a mount point, but that's a different thing).
All of this really matters for how easily you can manage and keep track
Btrfs's 'I am not a volume manager' approach also leads it to drastically limit the physical shape of a btrfs RAID array in a way that is actually painfully limiting. In ZFS, a pool stripes its data over a number of vdevs and each vdev can be any RAID type with any number of devices. Because ZFS allows multi-way mirrors this creates a straightforward way to create a three-way or four-way RAID 10 array; you just make all of the vdevs be three or four way mirrors. You can also change the mirror count on the fly, which is handy for all sorts of operations. In btrfs, the shape 'raid10' is a top level property of the overall btrfs 'filesystem' and, well, that's all you get. There is no easy place to put in multi-way mirroring; because of btrfs's model of not being a volume manager it would require changes in any number of places.
(And while I'm here, that btrfs requires you to specify both your data and your metadata RAID levels is crazy and gives people a great way to accidentally blow their own foot off.)
As a side note, I believe that btrfs's lack of allocation guarantees in a raid10 setup makes it impossible to create a btrfs filesystem split evenly across two controllers that is guaranteed to survive the loss of one entire controller. In ZFS this is trivial because of the explicit structure of vdevs in the pool.
PS: ZFS is too permissive in how you can assemble vdevs, because there is almost no point of a pool with, say, a mirror vdev plus a RAID-6 vdev. That configuration is all but guaranteed to be a mistake in some way.
sysadmin/SSLChasingCertChains written at 22:02:04; Add Comment
Chasing SSL certificate chains to build a chain file
Supposes that you have some shiny new SSL certificates for some reason. These new certificates need a chain of intermediate certificates in order to work with everything, but for some reason you don't have the right set. In ideal circumstances you'll be able to easily find the right intermediate certificates on your SSL CA's website and won't need the rest of this entry.
Okay, let's assume that your SSL CA's website is an unhelpful swamp pit. Fortunately all is not lost, because these days at least some SSL certificates come with the information needed to find the intermediate certificates. First we need to dump out our certificate, following my OpenSSL basics:
openssl x509 -text -noout -in WHAT.crt
This will print out a bunch of information. If you're in luck (or possibly always), down at the bottom there will be a 'Authority Information Access' section with a 'CA Issuers - URI' bit. That is the URL of the next certificate up the chain, so we fetch it:
(In case it's not obvious: for this purpose you don't have to worry if this URL is being fetched over HTTP instead of HTTPS. Either your certificate is signed by this public key or it isn't.)
Generally or perhaps always this will not be a plain text file like your certificate is, but instead a binary blob. The plain text format is called PEM; your fetched binary blob of a certificate is probably in the binary DER encoding. To convert from DER to PEM we do:
openssl x509 -inform DER -in <WGOT-FILE>.crt -outform PEM -out intermediate-01.crt
Now you can inspect intermediate-01.crt in the same to see if it needs a further intermediate certificate; if it does, iterate this process. When you have a suitable collection of PEM format intermediate certificates, simply concatenate them together in order (from the first you fetched to the last, per here) to create your chain file.
PS: The Qualys SSL Server Test is a good way to see how correct your certificate chain is. If it reports that it had to download any certificates, your chain of intermediate certificates is not complete. Similarly it may report that some entries in your chain are not necessary, although in practice this rarely hurts.
Sidebar: Browsers and certificate chains
As you might guess, some but not all browsers appear to use this embedded intermediate certificate URL to automatically fetch any necessary intermediate certificates during certificate validation (as mentioned eg here). Relatedly, browsers will probably not tell you about unnecessary intermediate certificates they received from your website. The upshot of this can be a HTTPS website that works in some browsers but fails in others, and in the failing browser it may appear that you sent no additional certificates as part of a certificate chain. Always test with a tool that will tell you the low-level details.
(Doing otherwise can cause a great deal of head scratching and frustration. Don't ask how I came to know this.)
python/WarningsModuleReactions written at 01:19:06; Add Comment
My reactions to Python's warnings module
A commentator on my entry on the warnings problem pointed out the existence of the warnings module as a possible solution to my issue. I've now played around with it and I don't think it fits my needs here, for two somewhat related reasons.
The first reason is that it simply makes me nervous to use or even take over the same infrastructure that Python itself uses for things like deprecation warnings. Warnings produced about Python code and warnings that my code produces are completely separate things and I don't like mingling them together, partly because they have significantly different needs.
The second reason is that the default formatting that the warnings module uses is completely wrong for the 'warnings produced from my program' case. I want my program warnings to produce standard Unix format (warning) messages and to, for example, not include the Python code snippet that generated them. Based on playing around with the warnings module briefly it's fairly clear that I would have to significantly reformat standard warnings to do what I want. At that point I'm not getting much out of the warnings module itself.
All of this is a sign of a fundamental decision in the warnings module: the warnings module is only designed to produce warnings about Python code. This core design purpose is reflected in many ways throughout the module, such as in the various sorts of filtering it offers and how you can't actually change the output format as far as I can see. I think that this makes it a bad fit for anything except that core purpose.
In short, if I want to log warnings I'm better off using general logging and general log filtering to control what warnings get printed. What features I want there are another entry.
* * *