2014-09-15
I want my signed email to work a lot like SSH does
PGP and similar technologies have been in the news lately, and as a result of this I added the Enigmail extension to my testing Thunderbird instance. Dealing with PGP through Enigmail reminded me of why I'm not fond of PGP. I'm aware that people have all sorts of good reasons and that PGP itself has decent reasons for working the way it does, but for me the real strain point is not the interface but fundamentally how PGP wants me to work. Today I want to talk just about signed email, or rather however I want to deal with signed email.
To put it simply, I want people's keys for signed email to mostly work like SSH host keys. For most people the core of using SSH is not about specifically extending trust to specific, carefully validated host keys but instead about noticing if things change. In practical use you accept a host's SSH key the first time you're offered it and then SSH will scream loudly and violently if it ever changes. This is weaker than full verification but is far easier to use, and it complicates the job of an active attacker (especially one that wants to get away with it undetected). Similarly, in casual use of signed email I'm not going to bother carefully verifying keys; I'm instead going to trust that the key I fetched the first time for the Ubuntu or Red Hat or whatever security team is in fact their key. If I suddenly start getting alerts about a key mismatch, then I'm going to worry and start digging. A similar thing applies to personal correspondents; for the most part I'm going to passively acquire their keys from keyservers or other methods and, well, that's it.
(I'd also like this to extend to things like DKIM signatures of email, because frankly it would be really great if my email client noticed that this email is not DKIM-signed when all previous email from a given address had been.)
On the other hand, I don't know how much sense it makes to even think about general MUA interfaces for casual, opportunistic signed email. There is a part of me that thinks signed email is a sexy and easy application (which is why people keep doing it) that actually doesn't have much point most of the time. Humans do terribly at checking authentication, which is why we mostly delegate that to computers, yet casual signed email in MUAs is almost entirely human checked. Quick, are you going to notice that the email announcement of a new update from your vendor's security team is not signed? Are you going to even care if the update system itself insists on signed updates downloaded from secure mirrors?
(My answers are probably not and no, respectively.)
For all that it's nice to think about the problem (and to grumble about the annoyances of PGP), a part of me thinks that opportunistic signed email is not so much the wrong problem as an uninteresting problem that protects almost nothing that will ever be attacked.
(This also ties into the problem of false positives in security. The reality is that for casual message signatures, almost all missing or failed signatures are likely to have entirely innocent explanations. Or at least I think that this is the likely explanation today; perhaps mail gets attacked more often than I think on today's Internet.)
2014-08-24
10G Ethernet is a sea change for my assumptions
We're soon going to migrate a bunch of filesystems to a SSD-based new fileserver, all at once. Such migrations force us to do full backups of migrated filesystems (to the backup system they appear as new filesystems), so a big move means a sudden surge in backup volume. As part of how to handle this surge, I had the obvious thought: we should upgrade the backup server that will handle the migrated filesystems to 10G Ethernet now. The 10G transfer speeds plus the source data being on SSDs would make it relatively simple to back up even this big migration overnight during our regular backup period.
Except I realized that this probably wasn't going to be the case. Our backup system writes backups to disk, specifically to ordinary SATA disks that are not aggregated together in any sort of striped setup, and an ordinary SATA disk might write at 160 Mbytes per second on a good day. This is only slightly faster than 1G Ethernet and certainly nowhere near the reasonable speeds of 10G Ethernet in our environment. We can read data off the SSD-based fileserver and send it over the network to the backup server very fast, but that doesn't really do us anywhere near as much good as it looks when the entire thing is then going to come to a screeching halt by the inconvenient need to write the data to disk on the backup server. 10G will probably help the backup servers a bit, but it isn't going to be anywhere near a great speedup.
What this points out is that my reflexive assumptions are calibrated all wrong for 10G Ethernet. I'm used to thinking of the network as slower than the disks, often drastically, but this is no longer even vaguely true. Even so-so 10G Ethernet performance (say 400 to 500 Mbytes/sec) utterly crushes single disk bandwidth for anything except SSDs. If we get good 10G speeds, we'll be crushing even moderate multi-disk bandwidth (and that's assuming we get full speed streaming IO rates and we're not seek limited). Suddenly the disks are the clear limiting factor, not the network. In fact even a single SSD can't keep up with a 10G Ethernet at full speed; we can see this from the mere fact that SATA interfaces themselves currently max out at 6 Gbits/sec on any system we're likely to use.
(I'd run into this before even for 1G Ethernet, eg here, but it evidently hadn't really sunk into my head.)
PS: I don't know what this means for our backup servers and any possible 10G networking in their future. 10G is likely to improve things somewhat, but the dual 10G-T Intel cards we use don't grow on trees and maybe it's not quite cost effective for them right now. Or maybe the real answer is working out how to give them striped staging disks for faster write speeds.
2014-08-09
Intel has screwed up their DC S3500 SSDs
I ranted about this on Twitter a few days ago when we discovered it the hard way but I want to write it down here and then cover why what Intel did is a terrible idea. The simple and short version is this:
Intel switched one of their 'datacenter' SSDs from reporting 512 byte 'physical sectors' to reporting 4096 byte physical sectors in a firmware update.
Specifically, we have the Intel DC S3500 80 GB model in firmware versions D2010355 (512b sectors) and D2010370 (4K sectors). Nothing in the part labeling changed other than the firmware version. Some investigation since our initial discovery has turned up that the 0370 firmware apparently supports both sector sizes and is theoretically switchable between them, and this apparently applies to both the SC3500 and SC3700 series SSDs.
This is a really terrible idea that should never have passed even a basic smell test in a product that is theoretically aimed at 'datacenter' server operations. There are applications where 512b drives and 4K drives are not compatible; for example, in some ZFS pools you can't replace a 512b SSD with a 4K SSD. Creating incompatible drives with the same detailed part number is something that irritates system administrators a great deal and of course it completely ruins the day of people who are trying to have and maintain a spares pool.
This Intel decision is especially asinine because the 'physical sector size' that these SSDs are reporting is essentially arbitrary (as we see here, it is apparently firmware-settable). The actual flash memory itself is clumped together in much larger units in ways that are not well represented by 'physical sector size', which is one reason that all SSDs report whatever number is convenient here.
There may well be good reason to make SSDs report as 4k sector drives instead of 512b drives; if nothing else it is a small bit closer to reality. But having started out with the DC S3500 series reporting 512b sectors Intel should have kept them that way in their out of box state (and made available a utility to switch them to 4k). If Intel felt it absolutely had to change that for some unfathomable reason, it should have at least changed the detailed part number when it updated the firmware; then people maintaining a spares stock would at least have some sign that something was up.
(Hopefully other SSD vendors are not going to get it into their heads to do something this irritating and stupid.)
In related news we now have a number of OmniOS fileservers which we literally have no direct spare system disks for, because their current system SSDs are the 0355 firmware ones.
(Yes, we are working on fixing that situation.)
2014-08-08
Hardware can be weird, Intel 10G-T X540-AT2 edition
Every so often I get a pointed reminder that hardware can be very weird. As I mentioned on Twitter today, we've been having one of those incidents recently. The story starts with the hardware for our new fileservers and iSCSI backends, which is built around SuperMicro X9SRH-7TF motherboards. These have an onboard Intel X540-AT2 chipset that provides two 10G-T ports. The SuperMicro motherboard and BIOS lights up these ports no later than when you power the machine on and leave it sitting in the BIOS, and maybe earlier (I haven't tested).
On some but not all of our motherboards, the first 10G-T port lights up (in the BIOS) at 1G instead of 10G. When we first saw this on a board we thought we had a failed board and RMA'd it; the replacement board behaved the same way but when we booted an OS (I believe a Linux) the port came up at 10G and we assumed that all was well. Then we noticed that some but not all of our newly installed OmniOS fileservers had their first port (still) coming up at 1G. At first we thought we had cable issues, but the cables were good.
In the process of testing the situation out, we rebooted one OmniOS fileserver off a CentOS 7 live cd to see if Linux could somehow get 10G out of the hardware. Somewhat to my surprise it could (and a real full 10G at that). More surprising, the port stayed at 10G when we rebooted into OmniOS. It stayed at 10G in OmniOS over a power cycle and it even stayed at 10G after a full power off where we cut power to the entire case for several minutes. Further testing showed that it was sufficient merely to boot the CentOS 7 live cd on an affected server without ever configuring the interface (although it's possible that the live cd configures the interface up to try DHCP and then brings it down again).
There's a lot of weirdness here. It'd be one thing for the Linux driver to bring up 10G where the OmniOS one didn't; then it could be that the Linux driver was more comprehensive about setting up the chipset properly. For it to be so firmly persistent is another thing, though; it suggests that Linux is reprogramming something that stays programmed in nonvolatile storage. And then there's the matter of this happening only on some motherboards and only to one port out of two that are driven by the same chipset.
Ultimately, who knows. We're happy because we apparently have a full solution to the problem, one we've actually carried out on all of the machines now because we needed to get them into production.
(As far as we can easily tell, all of the motherboards and the motherboard BIOSes are the same. We haven't opened up the cases to check the screen printing for changes and aren't going to; these machines are already installed and in production.)
2014-08-02
The benchmarking problems with potentially too-smart SSDs
We've reached the point in building out our new fileservers and iSCSI backends where we're building the one SSD-based fileserver and its backends. Naturally we want to see what sort of IO performance we get on SSDs, partly to make sure that everything is okay, so I fired up my standard basic testing tool for sequential IO. It gave me some numbers, the numbers looked good (in fact pretty excellent), and then I unfortunately started thinking about the fact that we're doing this with SSDs.
Testing basic IO speed on spinning rust is relatively easy because
spinning rust is in a sense relatively simple and predictable. Oh,
sure, you have different zones and remapped sectors and so on, but
you can be all but sure that when you write arbitrary data to disk
that it is actually going all the way down to the platters unaltered
(well, unless your filesystem does something excessively clever).
This matters for my testing because my usual source of test data
to write to disk is /dev/zero, and data from /dev/zero is what
you could call 'embarrassingly compressible' (and easily deduplicated
too).
The thing is, SSDs are not spinning rust and thus are nowhere near as predictable. SSDs contain a lot of magic, and increasingly some of that magic apparently involves internal compression on the data you feed them. When I was writing lots of zeros to the SSDs and then reading them back, was I actually testing the SSD read and write speeds or was I actually testing how fast the embedded processors in the SSDs could recognize zero blocks and recreate them in RAM?
(What matters to our users is the real IO speeds, because they are not likely to read and write zeros.)
Once you start going down the road of increasingly smart devices, the creeping madness starts rolling in remarkably fast. I started out thinking that I could generate a relatively small block of random data (say 4K or something reasonable) and repeatedly write that. But wait, SSDs actually use much larger internal block sizes and they may compress over that larger block size (which would contain several identical copies of my 4K 'simple' block). So I increased the randomness block size to 128K, but now I'm worrying about internal SSD deduplication since I'm writing a lot of copies of this.
The short version of my conclusion is that once I start down this road the only sensible approach is to generate fully random data. But if I'm testing high speed IO in an environment of SSDs and multiple 10G iSCSI networks, I need to generate this random data at a pretty high speed in order to be sure it's not the potential rate limiting step.
(By the way, /dev/urandom may be a good and easy source of random
data but it is very much not a high speed source. In fact it's an
amazingly slow source, especially on Linux. This was why my initial
approach was basically 'read N bytes from /dev/urandom and then
repeatedly write them out'.)
PS: I know that I'm ignoring all sorts of things that might affect SSD write speeds over time. Right now I'm assuming that they're going to be relatively immaterial in our environment for hand-waving reasons, including that we can't do anything about them. Of course it's possible that SSDs detect you writing large blocks of zeros and treat them as the equivalent of TRIM commands, but who knows.
2014-07-30
Why I like ZFS in general
Over time I've come around to really liking ZFS, not just in the context of our fileservers and our experiences with them but in general. There are two reasons for this.
The first is that when left to myself the data storage model I gravitate to is a changeable collection of filesystems without permanently fixed sizes that are layered on top of a chunk of mirrored storage. I believe I've been doing this since before I ran into ZFS because it's just the right and simple way: I don't have to try to predict how many filesystems I need in advance or how big they all have to be, and managing my mirrored storage in one big chunk instead of a chunk per filesystem is just easier. ZFS is far from the only implementation of this abstract model but it's an extremely simple and easy to use take on it, probably about as simple to use as you can get. And it's one set of software to deal with the whole stack of operations, instead of two or three.
(In Linux, for example, I do this with filesystems in LVM on top of software RAID mirrors. Each of these bits works well but there are three different sets of software involved and any number of multi-step operations to, say, start using larger replacement disks.)
The second is that as time goes by I've become increasingly concerned about both the possibility of quiet bitrot in data that I personally care about and the possibility of losing all of my data from a combination of a drive failure plus bad spots on the remaining drive. Noticing quiet bitrot takes filesystem checksums; recovering from various failure modes takes deep hooks into the RAID layer. ZFS has both and thus deals solidly with both failure possibilities, which is quite reassuring.
(ZFS also has its own fragilities, but let's pretend that software is perfect for the moment. And any theoretical fragilities have not bit me yet.)
As far as I know there is no practical competitor for ZFS in this space today (for simple single-machine setups), especially if you require it to be free or open source. The closest is btrfs but I've come to think that btrfs is doing it wrong on top of its immaturity.
2014-07-18
In practice, 10G-T today can be finicky
Not all that long ago I wrote an entry about why I think that 10G-T will be the dominant form of 10G Ethernet. While I still believe in the fundamental premise of that entry, since then I've also learned that 10G-T today can be kind of finicky in practice (regardless of what the theory says) and this can potentially make 10G-T deployments harder to do and to get reliable than SFP-based ones.
So far we've had two noteworthy incidents. In the most severe one a new firewall refused to recognize link on either 10G-T interface when they were plugged into existing 1G switches. We have no idea why and haven't been able to reproduce the problem; as far as we can tell everything should work. But it didn't. Our on the spot remediation was to switch out the 10G-T card for a dual-1G card and continue on.
(Our tests afterwards included putting the actual card that had the problem into another server of the exact same model and connecting it up to test switches of the exact same model; everything worked.)
A less severe recent issue was finding that one 10G-T cable either had never worked or had stopped working (it was on a pre-wired but uninstalled machine, so we can't be sure). This was an unexceptional short cable from a reputable supplier and apparently it still works if you seat both ends really firmly (which makes it unsuitable for machine room use, where cables may well get tugged out of that sort of thing). At one level I'm not hugely surprised by this; the reddit discussion of my previous entry had a bunch of people who commented that 10G-T could be quite sensitive to cabling issues. But it's still disconcerting to have it actually happen to us (and not with a long cable either).
To be clear, I don't regret our decision to go with 10G-T. Almost all of our 10G-T stuff is working and I don't think we could have afforded to do 10G at all if we'd had to use SFP+ modules. These teething problems are mild by comparison and I have no reason to think that they won't get better over time.
(But if you gave me buckets of money, well, I think that an all SFP+ solution is going to be more reliable today if you can afford it. And it clearly dissipates less power at the moment.)
2014-07-16
My (somewhat silly) SSD dilemma
The world has reached the point where I want to move my home machine from using spinning rust to using SSDs; in fact it's starting to reach the point where sticking on spinning rust seems dowdy and decidedly behind the times. I certainly would like extremely fast IO and no seek overheads and so on, especially when I do crazy things like rebuild Firefox from source on a regular basis. Unfortunately I have a dilemma because of a combination of three things:
- I insist on mirrored disks for anything I value, for good reason.
- I want the OS on different physical disks than my data because that
makes it much easier to do drastic things like full OS reinstalls
(my current system is set up this way).
- SSDs are not big enough for me to fit all of my data on one SSD (and a bunch of my data is stuff that doesn't need SSD speeds but does need to stay online for good long-term reasons).
(As a hobbyist photographer who shoots in RAW format, the last is probably going to be the case for a long time to come. Photography is capable of eating disk space like popcorn, and it gets worse if you're tempted to get into video too.)
Even if I was willing to accept a non-mirrored system disk (which I'm reluctant to do), satisfying all of this in one machine requires five drives (three SSDs plus two HDs). Six drives would be better. That's a lot of drives to put in one one case and to connect to one motherboard (especially given that an optical drive will require a SATA port these days and yes, I probably still want one).
(I think that relatively small SSDs are now cheap enough that I'd put the OS on SSDs for both speed and lower power. This is contrary to my view two years ago, but times change.)
There are various ways to make all of this fit, such as pushing the optical drive off to an external USB drive and giving up on separate system disk(s), but a good part of my dilemma is that I don't really like any of them. In part it feels like I'm trying to force a system design that is not actually ready yet and what I should be doing is, say, waiting for SSD capacities to go up another factor of two and the prices to drop a bunch more.
(I also suspect that we're going to see more and more mid-tower cases that are primarily designed for 2.5" SSDs, although casual checking suggests that one can get cases that will take a bunch of them even as it stands.)
In short: however tempting SSDs seem, right now it seems like we're in the middle of an incomplete technology transition. However much I'd vaguely like some, I'm probably better off waiting for another year or two or three. How fortunate that this matches my general lethargy about hardware purchases (although there's what happened with my last computer upgrade to make me wonder).
(My impression is that we're actually in the middle of several PC technology transitions. 4K monitors and 'retina' displays seem like another current one, for example, one that I'm quite looking forward to.)
2014-07-13
An obvious reminder: disks can and do die abruptly
Modern disks have a fearsome array of monitoring features in the form of all of their SMART attributes, and hopefully you are running something that monitors them and alerts you to trouble. In an ideal world, disks would decay gradually and give you plenty of advance warning about an impending death, letting you make backups and prepare the replacement and so on. And sometimes this does happen (and you get warnings from your SMART monitoring software about 'impending failure, back up your data now').
Sometimes, though, it doesn't. As an illustration of this, a disk on my home machine just went from apparently fine to 'very slow IO' to SMART warnings about 8 unreadable sectors to very dead in the space of less than 24 hours. If I had acted very fast I might have been able to make a backup of it before it died, but only because I both noticed the suddenly slow system and was able to diagnose it. Otherwise, well, the time between getting the SMART warnings and the death was about half an hour.
As it happened I did not leap to get a backup of it right away because it's only one half of a mirror pair (I did make a backup once it had actively failed). The possibility of abrupt disk failure is one large reason that I personally insist on RAID protection for any data that I care about; there may not be enough time to save data off a dying disk and having to restore from backups is disruptive (and backups are almost always incomplete).
I'm sure that everyone who runs decent-sized amounts of disks is well aware of the possibility of abrupt disk death already, and certainly we've had it happen to us at work. But it never hurts to have a pointed reminder of it smack me in the forehead every so often, even if it's a bit annoying.
(The brave future of SSDs instead of spinning mechanical disks may generally do better than this, although we'll have to see. We have experienced some abrupt SSD deaths, although that was with moderately early hardware. It's possible that SSDs will turn out to mostly have really long service lifetimes, especially if they're not written to particularly heavily.)
2014-06-06
On the Internet, weirdness is generally uncommon
One of the things that my exposure to SMTP daemons and SMTP's oddities has shown me vividly is that perhaps surprisingly, weirdness is uncommon on the practical Internet. Most clients and servers do the usual, common thing. Perhaps 'almost all'. For example, SMTP may contain very dark corners but these corners are also dank and unused, so dank and unused that your MTA may never encounter them.
(I can't find any trace of route addresses in 90 days of our mail gateway's logs of incoming traffic. Senders present quoted local parts infrequently, but they appear to all be spam; we block them all and have never had any reports of problems.)
This practical conservatism is in my view essential for keeping the Internet humming along. The Internet has a certain amount of carefully written software that was programmed by people who had assiduously read all of the relevant standards, and then it has a lot more software that was slapped together by people with various amounts of ignorance. If people used the obscure corners very much, much of the latter software would explode spectacularly. Worse, the burden of implementing Internet software would go up a lot in practice because you could no longer get away with just handling the easy, common cases.
(I'm a pragmatist. An Internet with less software would almost certainly be a smaller Internet. A non-compliant SMTP sender is annoying, but it usually gets the job done for people who are using it.)
The corollary of this is that a lot of Internet software out there probably doesn't handle corner cases or unusual situations very well, either through conscious choice or just because the authors weren't aware of them. There are consequences here both for security and for pragmatic interoperability.
Of course every so often you will stumble over someone who is sending you something from the dark depths. That the Internet is very big means that very uncommon things do happen every so often just through the law of large numbers. I'm sure that somewhere out on the net there are systems exchanging email with route addresses and maybe someday one of them will email us.
(Another corollary is that sooner or later you will see unusual
errors, too. For example, we reject a certain amount of email from
senders who have accented characters in unquoted local parts of
MAIL FROM addresses. This is very RFC non-compliant but not
surprising.)