Wandering Thoughts archives


What I know about the different types of SSH keys (and some opinions)

Modern versions of SSH support up to four different types of SSH keys (both for host keys to identify servers and for personal keys): RSA, DSA, ECDSA, and as of OpenSSH 6.5 we have ED25519 keys as well. Both ECDSA and ED25519 uses elliptic curve cryptography, DSA uses finite fields, and RSA is based on integer factorization. EC cryptography is said to have a number of advantages, particularly in that it uses smaller key sizes (and thus needs smaller exchanges on the wire to pass public keys back and forth).

(One discussion of this is this cloudflare blog post.)

RSA and DSA keys are supported by all SSH implementations (well, all SSH v2 implementations which is in practice 'all implementations' these days). ECDSA keys are supported primarily by reasonably recent versions of OpenSSH (from OpenSSH 5.7 onwards); they may not be in other versions, such as the SSH that you find on Solaris and OmniOS or on a Red Hat Enterprise 5 machine. ED25519 is only supported in OpenSSH 6.5 and later, which right now is very recent; of our main machines, only the Ubuntu 14.04 ones have it (especially note that it's not supported by the RHEL 7/CentOS 7 version of OpenSSH).

(I think ED25519 is also supported on Debian test (but not stable) and on up to date current FreeBSD and OpenBSD versions.)

SSH servers can offer multiple host keys in different key types (this is controlled by what HostKey files you have configured). The order that OpenSSH clients will try host keys in is controlled by two things: the setting of HostKeyAlgorithms (see 'man ssh_config' for the default) and what host keys are already known for the target host. If no host keys are known, I believe that the current order is order is ECDSA, ED25519, RSA, and then DSA; once there are known keys, they're tried first. What this really means is that for an otherwise unknown host you will be prompted to save the first of these key types that the host has and thereafter the host will be verified against it. If you already know an 'inferior' key (eg a RSA key when the host also advertises an ECDSA key), you will verify the host against the key you know and, as far as I can tell, not even save its 'better' key in .ssh/known_hosts.

(If you have a mixture of SSH client versions, people can wind up with a real mixture of your server key types in their known_hosts files or equivalent. This may mean that you need to preserve and restore multiple types of SSH host keys over server reinstalls, and probably add saving and restoring ED25519 keys when you start adding Ubuntu 14.04 servers to your mix.)

In terms of which key type is 'better', some people distrust ECDSA because the elliptic curve parameters are magic numbers from NIST and so could have secret backdoors, as appears to be all but certain for another NIST elliptic curve based cryptography standard (see also and also and more). I reflexively dislike both DSA and ECDSA because DSA implementation mistakes can be explosively fatal, as in 'trivially disclose your private keys'. While ED25519 also uses DSA it takes specific steps to avoid at least some of the explosive failures of plain DSA and ECDSA, failures that have led to eg the compromise of Sony's Playstation 3 signing keys.

(RFC 6979 discusses how to avoid this particular problem for DSA and ECDSA but it's not clear to me if OpenSSH implements it. I would assume not until explicitly stated otherwise.)

As a result of all of this I believe that the conservative choice is to advertise and use only RSA keys (both host keys and personal keys) with good bit sizes. The slightly daring choice is to use ED25519 when you have it available. I would not use either ECDSA or DSA although I wouldn't go out of my way to disable server ECDSA or DSA host keys except in a very high security environment.

(I call ED25519 'slightly daring' only because I don't believe it's undergone as much outside scrutiny as RSA, and I could be wrong about that. See here and here for a discussion of ECC and ED25519 security properties and general security issues. ED25519 is part of Dan Bernstein's work and in general he has a pretty good reputation on these issues. Certainly the OpenSSH people were willing to adopt it.)

PS: If you want to have your eyebrows raised about the choice of elliptic curve parameters, see here.

PPS: I don't know what types of keys non-Unix SSH clients support over and above basic RSA and DSA support. Some casual Internet searches suggest that PuTTY doesn't support ECDSA yet, for example. And even some Unix software may have difficulties; for example, currently GNOME Keyring apparently doesn't support ECDSA keys (via archlinux).

sysadmin/SSHKeyTypes written at 23:12:48; Add Comment


The CBL has a real false positive problem

As I write this, a number of IP addresses in are listed in the CBL, and various of them have been listed for some time. There is a problem with this: these CBL-listed IP addresses don't exist. I don't mean 'they aren't supposed to exist'; I mean 'they could only theoretically exist on a secure subnet in our machine room and even if they did exist our firewall wouldn't allow them to pass traffic'. So these IP addresses don't exist in a very strong sense. Yet the CBL lists them and has for some time.

The first false positive problem the CBL has is that they are listing this traffic at all. We have corresponded with the CBL about this and these listings (along with listings on other of our subnets) all come from traffic observed at a single one of their monitoring points. Unlike what I assumed in the past, these observations are not coming from parsing Received: headers but from real TCP traffic. However they are not connections from our network, and the university is the legitimate owner and router of 128.100/16. A CBL observation point that is using false routing (and is clearly using false routing over a significant period of time) is an actively dangerous thing; as we can see here, false routing can cause the CBL to list anything.

The second false positive problem the CBL has is that, as mentioned, we have corresponded with the CBL over this. In that correspondence the CBL spokesperson agreed that the CBL was incorrect in this listing and would get it fixed. That was a couple of months ago, yet a revolving cast of IP addresses still gets listed and relisted in the CBL. As a corollary of this, we can be confident that the CBL listening point(s) involved are still using false routes for some of their traffic. You can apply charitable or less charitable assumptions for this lack of actual action on the CBL's part; at a minimum it is clear that some acknowledged false positive problems go unfixed for whatever reason.

I don't particularly have a better option than the CBL these days. But I no longer trust it anywhere near as much as I used to and I don't particularly like its conduct here.

(And I feel like saying something about it so that other people can know and make their own decisions. And yes, the situation irritates me.)

(As mentioned, we've seen similar issues in the past, cf my original 2012 entry on the issue. This time around we've seen it on significantly more IP addresses, we have extremely strong confidence that it is a false positive problem, and most of all we've corresponded with the CBL people about it.)

spam/CBLFalsePositiveProblemII written at 23:03:23; Add Comment

HTTPS should remain genuinely optional on the web

I recently ran across Mozilla Bug 1041087 (via HN), which has the sort of harmless sound title of 'Switch generic icon to negative feedback for non-https sites'. Let me translate this to English: 'try to scare users if they're connecting to a non-https site'. For anyone who finds this attractive, let me say it flat out; this is a stupid idea on today's web.

(For the record, I don't think it's very likely that Mozilla will take this wishlist request seriously. I just think that there are people out there who wish that they would.)

I used to be very down on SSL Certificate Authorities, basically considering the whole thing a racket. It remains a racket but in today's environment of pervasive eavesdropping it is now a useful one; one might as well make the work of those eavesdroppers somewhat harder. I would be very enthusiastic for pervasive encryption if we could deploy that across the web.

Unfortunately we can't, exactly because of the SSL CA racket. Today having a SSL certificate means either scaring users and doing things that are terrible for security overall or being beholden to a SSL CA (and often although not always forking over money for this dubious privilege). Never mind the lack of true security due to the core SSL problem, this is not an attractive solution in general. Forcing mandatory HTTPS today means giving far too much power and influence to SSL CAs, often including the ability to turn off your website at their whim or mistake.

You might say that this proposal doesn't force mandatory HTTPS. That's disingenuous. Scaring users of a major browser when they visit a non-HTTPS site is effectively forcing HTTPS for the same reason that scary warnings about self-signed certificates force the use of official CA certificates. Very few websites can afford to scare users.

The time to force people towards HTTPS is when we've solved all of these problems. In other words, when absolutely any website can make itself a certificate and then securely advertise and use that certificate. We are nowhere near this ideal world in today's SSL CA environment (and we may or may not ever get there).

(By the way, I mean really mean any website here, including a hypothetical one run by anonymous people and hosted in some place either that no one likes or that generates a lot of fraud or both. There are a lot of proposals that basically work primarily for people in the West who are willing to be officially identified and can provide money; depart from this and you can find availability going downhill rapidly. Read up on the problems of genuine Nigerian entrepreneurs someday.)

web/HTTPSOptional written at 00:11:42; Add Comment


Some consequences of widespread use of OCSP for HTTPS

OCSP is an attempt to solve some of the problems of certificate revocation. The simple version of how it works is that when your browser contacts a HTTPS website, it asks the issuer of the site's certificate if the certificate is both known and still valid. One important advantage of OCSP over CRLs that the CA now has an avenue to 'revoke' certificates that it doesn't know about. If the CA doesn't have a certificate in its database, it can assert 'unknown certificate' in reply to your query and the certificate doesn't work.

The straightforward implication of OCSP is that the CA knows that you're trying to talk to a particular website at a particular time. Often third parties can also know this, because OCSP queries may well be done over HTTP instead of HTTPS. OCSP stapling attempts to work around the privacy implications by having the website include a pre-signed, limited duration current attestation about their certificate from the CA, but it may not be widely supported.

(Website operators have to have software that supports OCSP stapling and specifically configure it. OCSP checking in general simply needs a field set in the certificate, which the CA generally forces on your SSL certificates if it supports OCSP.)

The less obvious implication of OCSP is that your CA can now turn off your HTTPS website any time it either wants to, is legally required to, or simply screws up something in its OCSP database. If your browser checks OCSP status and the OCSP server says 'I do not know this certificate', your browser is going to hard-fail the HTTPS connection. In fact it really has to, because this is exactly the response that it would get if the CA had been subverted into issuing an imposter certificate in some way that was off the books.

You may be saying 'a CA would never do this'. I regret to inform my readers that I've already seen this happen. The blunt fact is that keeping high volume services running is not trivial and systems suffer database glitches all the time. It's just that with OCSP someone else's service glitch can take down your website, my website, or in fact a whole lot of websites all at once.

As they say, this is not really a good way to run a railroad.

(See also Adam Langley on why revocation checking doesn't really help with security. This means that OCSP is both dangerous and significantly useless. Oh, and of course it often adds extra latency to your HTTPS connections since it needs to do extra requests to check the OCSP status.)

PS: note that OCSP stapling doesn't protect you from your CA here. It can protect you from temporary short-term glitches that fix themselves automatically (because you can just hold on to a valid OCSP response while the glitch fixes itself), but that's it. If the CA refuses to validate your certificate for long enough (either deliberately or through a long-term problem), your cached OCSP response expires and you're up the creek.

web/OCSPConsequences written at 00:22:06; Add Comment


In practice, 10G-T today can be finicky

Not all that long ago I wrote an entry about why I think that 10G-T will be the dominant form of 10G Ethernet. While I still believe in the fundamental premise of that entry, since then I've also learned that 10G-T today can be kind of finicky in practice (regardless of what the theory says) and this can potentially make 10G-T deployments harder to do and to get reliable than SFP-based ones.

So far we've had two noteworthy incidents. In the most severe one a new firewall refused to recognize link on either 10G-T interface when they were plugged into existing 1G switches. We have no idea why and haven't been able to reproduce the problem; as far as we can tell everything should work. But it didn't. Our on the spot remediation was to switch out the 10G-T card for a dual-1G card and continue on.

(Our tests afterwards included putting the actual card that had the problem into another server of the exact same model and connecting it up to test switches of the exact same model; everything worked.)

A less severe recent issue was finding that one 10G-T cable either had never worked or had stopped working (it was on a pre-wired but uninstalled machine, so we can't be sure). This was an unexceptional short cable from a reputable supplier and apparently it still works if you seat both ends really firmly (which makes it unsuitable for machine room use, where cables may well get tugged out of that sort of thing). At one level I'm not hugely surprised by this; the reddit discussion of my previous entry had a bunch of people who commented that 10G-T could be quite sensitive to cabling issues. But it's still disconcerting to have it actually happen to us (and not with a long cable either).

To be clear, I don't regret our decision to go with 10G-T. Almost all of our 10G-T stuff is working and I don't think we could have afforded to do 10G at all if we'd had to use SFP+ modules. These teething problems are mild by comparison and I have no reason to think that they won't get better over time.

(But if you gave me buckets of money, well, I think that an all SFP+ solution is going to be more reliable today if you can afford it. And it clearly dissipates less power at the moment.)

tech/Finicky10GT written at 01:27:28; Add Comment


My (somewhat silly) SSD dilemma

The world has reached the point where I want to move my home machine from using spinning rust to using SSDs; in fact it's starting to reach the point where sticking on spinning rust seems dowdy and decidedly behind the times. I certainly would like extremely fast IO and no seek overheads and so on, especially when I do crazy things like rebuild Firefox from source on a regular basis. Unfortunately I have a dilemma because of a combination of three things:

  • I insist on mirrored disks for anything I value, for good reason.

  • I want the OS on different physical disks than my data because that makes it much easier to do drastic things like full OS reinstalls (my current system is set up this way).

  • SSDs are not big enough for me to fit all of my data on one SSD (and a bunch of my data is stuff that doesn't need SSD speeds but does need to stay online for good long-term reasons).

(As a hobbyist photographer who shoots in RAW format, the last is probably going to be the case for a long time to come. Photography is capable of eating disk space like popcorn, and it gets worse if you're tempted to get into video too.)

Even if I was willing to accept a non-mirrored system disk (which I'm reluctant to do), satisfying all of this in one machine requires five drives (three SSDs plus two HDs). Six drives would be better. That's a lot of drives to put in one one case and to connect to one motherboard (especially given that an optical drive will require a SATA port these days and yes, I probably still want one).

(I think that relatively small SSDs are now cheap enough that I'd put the OS on SSDs for both speed and lower power. This is contrary to my view two years ago, but times change.)

There are various ways to make all of this fit, such as pushing the optical drive off to an external USB drive and giving up on separate system disk(s), but a good part of my dilemma is that I don't really like any of them. In part it feels like I'm trying to force a system design that is not actually ready yet and what I should be doing is, say, waiting for SSD capacities to go up another factor of two and the prices to drop a bunch more.

(I also suspect that we're going to see more and more mid-tower cases that are primarily designed for 2.5" SSDs, although casual checking suggests that one can get cases that will take a bunch of them even as it stands.)

In short: however tempting SSDs seem, right now it seems like we're in the middle of an incomplete technology transition. However much I'd vaguely like some, I'm probably better off waiting for another year or two or three. How fortunate that this matches my general lethargy about hardware purchases (although there's what happened with my last computer upgrade to make me wonder).

(My impression is that we're actually in the middle of several PC technology transitions. 4K monitors and 'retina' displays seem like another current one, for example, one that I'm quite looking forward to.)

tech/MySSDDilemma written at 23:51:13; Add Comment


A data point on how rapidly spammers pick up addresses from the web

On June 15, what is almost exactly a month ago now, I wrote an entry on a weird non-relaying relay attempt I saw. In the entry I quoted a SMTP conversation, including a local address handled by my sinkhole SMTP server. As I was writing the entry I decided to change the local part of the address to an obviously bogus 'XXXX' and then see if spammers picked up that address and started trying to deliver things to that new address.

I am now able to report that it took less than a month. On July 11th I saw the first delivery attempt; July 14th saw the second and third ones. The first and the third 'succeeded' in getting all the way to a DATA submission (which was 5xx'd but had the message captured for my inspection). The resulting spam is a little bit interesting.

The first spam message looks like a serious attempt by what seems like a Chinese-affiliated spam gang to sell me some e-mail address databases, based on what geographic area I wanted to target, and maybe hawk their spamming services too. It uses a forged envelope sender and comes from a US hosting/cloud provider, with replies directed to 163.com and a image in its HTML being fetched from a tagged URL on a Chinese IP address.

The second spam message (from the third delivery attempt) comes from what is probably a compromised mail server in the UK. It is plain and straightforward advance fee fraud, and not a particularly sophisticated one; apart from the destination address there is absolutely nothing unusual about it. It was probably ultimately sent from Malaysia, perhaps from a compromised machine of some sort (the likely source IP is currently in the CBL).

(The second delivery attempt had sufficiently many signs of being ordinary advance fee fraud that my sinkhole SMTP server rejected it before DATA. Now that I look it comes from an IP address in the same /24 as the first delivery attempt; it got rejected early because the envelope sender address claimed to be from qq.com. I've switched my sinkhole SMTP server to early rejection of stuff that's likely to be boring spam because I've already collected enough samples of it. Maybe someday I'll change my mind and do a completely raw 'one week in spam', but not right now.)

There is an obvious theory about what happened with my address here: scraped by a spammer who briefly attempted to market services to me and then started selling the address and/or their spamming services to other spammers. I can't know if this story is right, of course. I may learn more if more spam arrives for that address.

(And if no more spam arrives for the address I'll also learn something. At this point I do expect it to get more spam, though, since it's in the hands of advance fee fraud spammers.)

spam/SpammerAddressPickup written at 23:29:48; Add Comment


Unmounting recoverable stale NFS mounts on Linux

Suppose that you have NFS mounts go stale on your Linux clients by accident; perhaps you have disabled sharing of some filesystem on the fileserver without quite unmounting it on all the clients first. Now you try to unmount them on the clients and you get the cheerful error message:

# umount /cs/dtr
/cs/dtr was not found in /proc/mounts

You have two problems here. The first problem is in umount.nfs and it is producing the error message you see here. This error message happens because at least some versions of the kernel helpfully change the /proc/mounts output for a NFS mount that they have detected as stale. Instead of the normal output, you get:

fs8:/cs/4/dtr /cs/dtr\040(deleted) nfs rw,nosuid,....

(Instead of plain '/cs/dtr' as the mount point.)

This of course does not match what is in /etc/mtab and umount.nfs errors out with the above error message. As far as I can tell from our brief experience with this today, there is no way to cause the kernel to reverse its addition of this '\040(deleted)' marker. You can make the filesystem non-stale (by re-exporting it or the like), have IO to the NFS mount on the client work fine, and the kernel will still keep it there. You are screwed. To get around this you need to build a patched version of nfs-utils (see also). You want to modify utils/mount/nfsumount.c; search for the error message to find where.

(Note that compiling nfs-utils only gets you a mount.nfs binary. This is actually the same program as umount.nfs; it check its name when invoked to decide what to do, so you need to get it invoked under the right name in some way.)

Unfortunately you're not out of the woods because as far as I can tell many versions of the Linux kernel absolutely refuse to let you unmount a stale NFS mountpoint. The actual umount() system calls will fail with ESTALE even when you can persuade or patch umount.nfs to make them. As far as I know the only way to recover from this is to somehow make the NFS mount non-stale; at this point a patched umount.nfs can make a umount() system call that will succeed. Otherwise you get to reboot the NFS client.

(I have tested and both the umount() and umount2() system calls will fail.)

The kernel side of this problem has apparently been fixed in 3.12 via this commit (found via here), so on really modern distributions such as Ubuntu 14.04 all you may need to do is build a patched umount.nfs. It is very much an issue on older ones such as Ubuntu 12.04 (and perhaps even CentOS 7, although maybe this fix got backported). In the mean time try not to let your NFS mounts become stale, or at least don't let the client kernels notice that they are stale.

(If an NFS mount is stale on the server but nothing on the client has noticed yet, you can still successfully unmount it without problems. But the first df or whatever that gets an ESTALE back from the server blows everything up.)

For additional information on this see eg Ubuntu bug 974374 or Fedora bug 980088 or this linux-nfs mailing list message and thread.

linux/NFSStaleUnmounting written at 22:26:37; Add Comment


An obvious reminder: disks can and do die abruptly

Modern disks have a fearsome array of monitoring features in the form of all of their SMART attributes, and hopefully you are running something that monitors them and alerts you to trouble. In an ideal world, disks would decay gradually and give you plenty of advance warning about an impending death, letting you make backups and prepare the replacement and so on. And sometimes this does happen (and you get warnings from your SMART monitoring software about 'impending failure, back up your data now').

Sometimes, though, it doesn't. As an illustration of this, a disk on my home machine just went from apparently fine to 'very slow IO' to SMART warnings about 8 unreadable sectors to very dead in the space of less than 24 hours. If I had acted very fast I might have been able to make a backup of it before it died, but only because I both noticed the suddenly slow system and was able to diagnose it. Otherwise, well, the time between getting the SMART warnings and the death was about half an hour.

As it happened I did not leap to get a backup of it right away because it's only one half of a mirror pair (I did make a backup once it had actively failed). The possibility of abrupt disk failure is one large reason that I personally insist on RAID protection for any data that I care about; there may not be enough time to save data off a dying disk and having to restore from backups is disruptive (and backups are almost always incomplete).

I'm sure that everyone who runs decent-sized amounts of disks is well aware of the possibility of abrupt disk death already, and certainly we've had it happen to us at work. But it never hurts to have a pointed reminder of it smack me in the forehead every so often, even if it's a bit annoying.

(The brave future of SSDs instead of spinning mechanical disks may generally do better than this, although we'll have to see. We have experienced some abrupt SSD deaths, although that was with moderately early hardware. It's possible that SSDs will turn out to mostly have really long service lifetimes, especially if they're not written to particularly heavily.)

tech/AbruptDiskDeath written at 22:37:52; Add Comment

Early impressions of CentOS 7

For reasons involving us being unimpressed with Ubuntu 14.04, we're building our second generation iSCSI backends on top of CentOS 7 (basically because it just came out in time). We have recently put the first couple of them into production so now seems a good time to report my early impressions of CentOS 7.

I'll start with the installation, which has impressed me in two different ways. The first is that it does RAID setup the right way: you define filesystems (or swap areas), tell the installer that you want them to be RAID-1, and it magically figures everything out and does it right. The second is that it is the first installer I've ever used that can reliably and cleanly reinstall itself over an already-installed system (and it's even easy to tell it how to do this). You would think that this would be trivial, but I've seen any number of installers explode; a common failure point in Linux installers is assembling existing RAID arrays on the disks then failing to completely disassemble them before it tries to repartition the disks. CentOS 7 has no problems, which is something that I really appreciate.

(Some installers are so bad that one set of build instructions I wrote recently started out with 'if these disks have been used before, completely blank them out with dd beforehand using a live CD'.)

Some people will react badly to the installer being a graphical one and also perhaps somewhat confusing. I find it okay but I don't think it's perfect. It is kind of nice to be able to do steps in basically whatever order works for you instead of being forced into a linear order, but on the other hand it's possible to overlook some things.

After installation, everything has been trouble free so far. While I think CentOS 7 still uses NetworkManager it does it far better than how Red Hat Enterprise 6 did; in other words the networking works and I don't particularly notice that it's using NetworkManager behind the scenes. We can (and do) set things up in /etc/sysconfig/network-scripts in the traditional manner. CentOS 7 defaults to 'consistent network device naming' but unlike Ubuntu 14.04 it works and the names are generally sane. On our hardware we get Ethernet device names of enp1s0f0, enp1s0f1, and enp7s0; the first two are the onboard 10G-T ports and the third is the add-on 1G card. We can live with that.

(The specific naming scheme that CentOS 7 normally uses is described in the Red Hat documentation here, which I am sad to note needs JavaScript to really see anything.)

CentOS 7 uses systemd and has mostly converted things away from /etc/init.d startup scripts. Some people may have an explosive reaction to this shift but I don't; I've been using systemd on my Fedora systems for some time and I actually like it and think it's a pretty good init system (see also the second sidebar here). Everything seems to work in the usual systemd way and I didn't have any particular problems adding, eg, a serial getty. I did quite appreciate that systemd automatically activated a serial getty based on a serial console being configured in the kernel command line.

Overall I guess the good news is that I don't have anything much to say because stuff just works and I haven't run into any unpleasant surprises. The one thing that stands out is how nice the installer is.

linux/CentOS7EarlyImpressions written at 01:00:20; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.