Wandering Thoughts


What influences SSH's bulk transfer speeds

A number of years ago I wrote How fast various ssh ciphers are because I was curious about just how fast you could do bulk SSH transfers and how to get them to go fast under various circumstances. Since then I have learned somewhat more about SSH speed and what controls what things you have available and can get.

To start with, my years ago entry was naively incomplete because SSH encryption has two components: it has both a cipher and a cryptographic hash used as the MAC. The choice of both of them can matter, especially if you're willing to deliberately weaken the MAC. As an example of how much of an impact this might make, in my testing on a Linux machine I could almost double SSH bandwidth by switching from the default MAC to 'umac-64-etm@openssh.com'.

(At the same time, no other MAC choice made much of a difference within a particular cipher, although hmac-sha1 was sometimes a bit faster than hmac-md5.)

Clients set the cipher list with -c and the MAC with -m, or with the Ciphers and MACs options in your SSH configuration file (either a personal one or a global one). However, what the client wants to use has to be both supported by the server and accepted by it; this is set in the server's Ciphers and MACs configuration options. The manpages for ssh_config and sshd_config on your system will hopefully document both what your system supports at all and what it's set to accept by default. Note that this is not necessarily the same thing; I've seen systems where sshd knows about ciphers that it will not accept by default.

(Some modern versions of OpenSSH also report this information through 'ssh -Q <option>'; see the ssh manpage for details. Note that such lists are not necessarily reported in preference order.)

At least some SSH clients will tell you what the server's list of acceptable ciphers (and MACs) if you tell the client to use options that the server doesn't support. If you wanted to, I suspect that you could write a program in some language with SSH protocol libraries that dumped all of this information for you for an arbitrary server (without the fuss of having to find a cipher and MAC that your client knew about but your server didn't accept).

Running 'ssh -v' will report the negotiated cipher and MAC that are being used for the connection. Technically there are two sets of them, one for the client to server and one for the server back to the client, but I believe that under all but really exceptional situations you'll use the same cipher and MAC in both directions.

Different Unix OSes may differ significantly in their support for both ciphers and MACs. In particular Solaris effectively forked a relatively old version of OpenSSH and so modern versions of Illumos (and Illumos distributions such as OmniOS) do not offer you anywhere near a modern list of choices here. How recent your distribution is will also matter; our Ubuntu 14.04 machines naturally offer us a lot more choice than our Ubuntu 10.04 ones.

PS: helpfully the latest OpenSSH manpages are online (cf), so the current manpage for ssh_config will tell you the latest set of ciphers and MACs supported by the official OpenSSH and also show the current preference order. To my interest it appears that OpenSSH now defaults to the very fast umac-64-etm MAC.

sysadmin/SshBulkSpeed written at 23:21:12; Add Comment

One of SELinux's important limits

People occasionally push SELinux as the cure for security problems and look down on people who routinely disable it (as we do). I have some previously expressed views on this general attitude, but what I feel like pointing out today is that SELinux's security has some important intrinsic limits. One big one is that SELinux only acts at process boundaries.

By its nature, SELinux exists to stop a process (or a collection of them) from doing 'bad things' to the rest of the system and to the outside environment. But there are any number of dangerous exploits that do not cross a process's boundaries this way; the most infamous recent one is Heartbleed. SELinux can do nothing to stop these exploits because they happen entirely inside the process, in spheres fully outside its domain. SELinux can only act if the exploit seeks to exfiltrate data (or influence the outside world) through some new channel that the process does not normally use, and in many cases the exploit doesn't need to do that (and often doesn't bother).

Or in short, SELinux cannot stop your web server or your web browser from getting compromised, only from doing new stuff afterwards. Sending all of the secrets that your browser or server already has access to to someone in the outside world? There's nothing SELinux can do about that (assuming that the attacker is competent). This is a large and damaging territory that SELinux doesn't help with.

(Yes, yes, privilege separation. There are a number of ways in which this is the mathematical security answer instead of the real one, including that most network related programs today are not privilege separated. Chrome exploits also have demonstrated that privilege separation is very hard to make leak-proof.)

linux/SELinuxProgramBoundaries written at 00:24:23; Add Comment


What I know about the different types of SSH keys (and some opinions)

Modern versions of SSH support up to four different types of SSH keys (both for host keys to identify servers and for personal keys): RSA, DSA, ECDSA, and as of OpenSSH 6.5 we have ED25519 keys as well. Both ECDSA and ED25519 uses elliptic curve cryptography, DSA uses finite fields, and RSA is based on integer factorization. EC cryptography is said to have a number of advantages, particularly in that it uses smaller key sizes (and thus needs smaller exchanges on the wire to pass public keys back and forth).

(One discussion of this is this cloudflare blog post.)

RSA and DSA keys are supported by all SSH implementations (well, all SSH v2 implementations which is in practice 'all implementations' these days). ECDSA keys are supported primarily by reasonably recent versions of OpenSSH (from OpenSSH 5.7 onwards); they may not be in other versions, such as the SSH that you find on Solaris and OmniOS or on a Red Hat Enterprise 5 machine. ED25519 is only supported in OpenSSH 6.5 and later, which right now is very recent; of our main machines, only the Ubuntu 14.04 ones have it (especially note that it's not supported by the RHEL 7/CentOS 7 version of OpenSSH).

(I think ED25519 is also supported on Debian test (but not stable) and on up to date current FreeBSD and OpenBSD versions.)

SSH servers can offer multiple host keys in different key types (this is controlled by what HostKey files you have configured). The order that OpenSSH clients will try host keys in is controlled by two things: the setting of HostKeyAlgorithms (see 'man ssh_config' for the default) and what host keys are already known for the target host. If no host keys are known, I believe that the current order is order is ECDSA, ED25519, RSA, and then DSA; once there are known keys, they're tried first. What this really means is that for an otherwise unknown host you will be prompted to save the first of these key types that the host has and thereafter the host will be verified against it. If you already know an 'inferior' key (eg a RSA key when the host also advertises an ECDSA key), you will verify the host against the key you know and, as far as I can tell, not even save its 'better' key in .ssh/known_hosts.

(If you have a mixture of SSH client versions, people can wind up with a real mixture of your server key types in their known_hosts files or equivalent. This may mean that you need to preserve and restore multiple types of SSH host keys over server reinstalls, and probably add saving and restoring ED25519 keys when you start adding Ubuntu 14.04 servers to your mix.)

In terms of which key type is 'better', some people distrust ECDSA because the elliptic curve parameters are magic numbers from NIST and so could have secret backdoors, as appears to be all but certain for another NIST elliptic curve based cryptography standard (see also and also and more). I reflexively dislike both DSA and ECDSA because DSA implementation mistakes can be explosively fatal, as in 'trivially disclose your private keys'. While ED25519 also uses DSA it takes specific steps to avoid at least some of the explosive failures of plain DSA and ECDSA, failures that have led to eg the compromise of Sony's Playstation 3 signing keys.

(RFC 6979 discusses how to avoid this particular problem for DSA and ECDSA but it's not clear to me if OpenSSH implements it. I would assume not until explicitly stated otherwise.)

As a result of all of this I believe that the conservative choice is to advertise and use only RSA keys (both host keys and personal keys) with good bit sizes. The slightly daring choice is to use ED25519 when you have it available. I would not use either ECDSA or DSA although I wouldn't go out of my way to disable server ECDSA or DSA host keys except in a very high security environment.

(I call ED25519 'slightly daring' only because I don't believe it's undergone as much outside scrutiny as RSA, and I could be wrong about that. See here and here for a discussion of ECC and ED25519 security properties and general security issues. ED25519 is part of Dan Bernstein's work and in general he has a pretty good reputation on these issues. Certainly the OpenSSH people were willing to adopt it.)

PS: If you want to have your eyebrows raised about the choice of elliptic curve parameters, see here.

PPS: I don't know what types of keys non-Unix SSH clients support over and above basic RSA and DSA support. Some casual Internet searches suggest that PuTTY doesn't support ECDSA yet, for example. And even some Unix software may have difficulties; for example, currently GNOME Keyring apparently doesn't support ECDSA keys (via archlinux).

sysadmin/SSHKeyTypes written at 23:12:48; Add Comment


The CBL has a real false positive problem

As I write this, a number of IP addresses in are listed in the CBL, and various of them have been listed for some time. There is a problem with this: these CBL-listed IP addresses don't exist. I don't mean 'they aren't supposed to exist'; I mean 'they could only theoretically exist on a secure subnet in our machine room and even if they did exist our firewall wouldn't allow them to pass traffic'. So these IP addresses don't exist in a very strong sense. Yet the CBL lists them and has for some time.

The first false positive problem the CBL has is that they are listing this traffic at all. We have corresponded with the CBL about this and these listings (along with listings on other of our subnets) all come from traffic observed at a single one of their monitoring points. Unlike what I assumed in the past, these observations are not coming from parsing Received: headers but from real TCP traffic. However they are not connections from our network, and the university is the legitimate owner and router of 128.100/16. A CBL observation point that is using false routing (and is clearly using false routing over a significant period of time) is an actively dangerous thing; as we can see here, false routing can cause the CBL to list anything.

The second false positive problem the CBL has is that, as mentioned, we have corresponded with the CBL over this. In that correspondence the CBL spokesperson agreed that the CBL was incorrect in this listing and would get it fixed. That was a couple of months ago, yet a revolving cast of IP addresses still gets listed and relisted in the CBL. As a corollary of this, we can be confident that the CBL listening point(s) involved are still using false routes for some of their traffic. You can apply charitable or less charitable assumptions for this lack of actual action on the CBL's part; at a minimum it is clear that some acknowledged false positive problems go unfixed for whatever reason.

I don't particularly have a better option than the CBL these days. But I no longer trust it anywhere near as much as I used to and I don't particularly like its conduct here.

(And I feel like saying something about it so that other people can know and make their own decisions. And yes, the situation irritates me.)

(As mentioned, we've seen similar issues in the past, cf my original 2012 entry on the issue. This time around we've seen it on significantly more IP addresses, we have extremely strong confidence that it is a false positive problem, and most of all we've corresponded with the CBL people about it.)

spam/CBLFalsePositiveProblemII written at 23:03:23; Add Comment

HTTPS should remain genuinely optional on the web

I recently ran across Mozilla Bug 1041087 (via HN), which has the sort of harmless sound title of 'Switch generic icon to negative feedback for non-https sites'. Let me translate this to English: 'try to scare users if they're connecting to a non-https site'. For anyone who finds this attractive, let me say it flat out; this is a stupid idea on today's web.

(For the record, I don't think it's very likely that Mozilla will take this wishlist request seriously. I just think that there are people out there who wish that they would.)

I used to be very down on SSL Certificate Authorities, basically considering the whole thing a racket. It remains a racket but in today's environment of pervasive eavesdropping it is now a useful one; one might as well make the work of those eavesdroppers somewhat harder. I would be very enthusiastic for pervasive encryption if we could deploy that across the web.

Unfortunately we can't, exactly because of the SSL CA racket. Today having a SSL certificate means either scaring users and doing things that are terrible for security overall or being beholden to a SSL CA (and often although not always forking over money for this dubious privilege). Never mind the lack of true security due to the core SSL problem, this is not an attractive solution in general. Forcing mandatory HTTPS today means giving far too much power and influence to SSL CAs, often including the ability to turn off your website at their whim or mistake.

You might say that this proposal doesn't force mandatory HTTPS. That's disingenuous. Scaring users of a major browser when they visit a non-HTTPS site is effectively forcing HTTPS for the same reason that scary warnings about self-signed certificates force the use of official CA certificates. Very few websites can afford to scare users.

The time to force people towards HTTPS is when we've solved all of these problems. In other words, when absolutely any website can make itself a certificate and then securely advertise and use that certificate. We are nowhere near this ideal world in today's SSL CA environment (and we may or may not ever get there).

(By the way, I mean really mean any website here, including a hypothetical one run by anonymous people and hosted in some place either that no one likes or that generates a lot of fraud or both. There are a lot of proposals that basically work primarily for people in the West who are willing to be officially identified and can provide money; depart from this and you can find availability going downhill rapidly. Read up on the problems of genuine Nigerian entrepreneurs someday.)

web/HTTPSOptional written at 00:11:42; Add Comment


Some consequences of widespread use of OCSP for HTTPS

OCSP is an attempt to solve some of the problems of certificate revocation. The simple version of how it works is that when your browser contacts a HTTPS website, it asks the issuer of the site's certificate if the certificate is both known and still valid. One important advantage of OCSP over CRLs that the CA now has an avenue to 'revoke' certificates that it doesn't know about. If the CA doesn't have a certificate in its database, it can assert 'unknown certificate' in reply to your query and the certificate doesn't work.

The straightforward implication of OCSP is that the CA knows that you're trying to talk to a particular website at a particular time. Often third parties can also know this, because OCSP queries may well be done over HTTP instead of HTTPS. OCSP stapling attempts to work around the privacy implications by having the website include a pre-signed, limited duration current attestation about their certificate from the CA, but it may not be widely supported.

(Website operators have to have software that supports OCSP stapling and specifically configure it. OCSP checking in general simply needs a field set in the certificate, which the CA generally forces on your SSL certificates if it supports OCSP.)

The less obvious implication of OCSP is that your CA can now turn off your HTTPS website any time it either wants to, is legally required to, or simply screws up something in its OCSP database. If your browser checks OCSP status and the OCSP server says 'I do not know this certificate', your browser is going to hard-fail the HTTPS connection. In fact it really has to, because this is exactly the response that it would get if the CA had been subverted into issuing an imposter certificate in some way that was off the books.

You may be saying 'a CA would never do this'. I regret to inform my readers that I've already seen this happen. The blunt fact is that keeping high volume services running is not trivial and systems suffer database glitches all the time. It's just that with OCSP someone else's service glitch can take down your website, my website, or in fact a whole lot of websites all at once.

As they say, this is not really a good way to run a railroad.

(See also Adam Langley on why revocation checking doesn't really help with security. This means that OCSP is both dangerous and significantly useless. Oh, and of course it often adds extra latency to your HTTPS connections since it needs to do extra requests to check the OCSP status.)

PS: note that OCSP stapling doesn't protect you from your CA here. It can protect you from temporary short-term glitches that fix themselves automatically (because you can just hold on to a valid OCSP response while the glitch fixes itself), but that's it. If the CA refuses to validate your certificate for long enough (either deliberately or through a long-term problem), your cached OCSP response expires and you're up the creek.

web/OCSPConsequences written at 00:22:06; Add Comment


In practice, 10G-T today can be finicky

Not all that long ago I wrote an entry about why I think that 10G-T will be the dominant form of 10G Ethernet. While I still believe in the fundamental premise of that entry, since then I've also learned that 10G-T today can be kind of finicky in practice (regardless of what the theory says) and this can potentially make 10G-T deployments harder to do and to get reliable than SFP-based ones.

So far we've had two noteworthy incidents. In the most severe one a new firewall refused to recognize link on either 10G-T interface when they were plugged into existing 1G switches. We have no idea why and haven't been able to reproduce the problem; as far as we can tell everything should work. But it didn't. Our on the spot remediation was to switch out the 10G-T card for a dual-1G card and continue on.

(Our tests afterwards included putting the actual card that had the problem into another server of the exact same model and connecting it up to test switches of the exact same model; everything worked.)

A less severe recent issue was finding that one 10G-T cable either had never worked or had stopped working (it was on a pre-wired but uninstalled machine, so we can't be sure). This was an unexceptional short cable from a reputable supplier and apparently it still works if you seat both ends really firmly (which makes it unsuitable for machine room use, where cables may well get tugged out of that sort of thing). At one level I'm not hugely surprised by this; the reddit discussion of my previous entry had a bunch of people who commented that 10G-T could be quite sensitive to cabling issues. But it's still disconcerting to have it actually happen to us (and not with a long cable either).

To be clear, I don't regret our decision to go with 10G-T. Almost all of our 10G-T stuff is working and I don't think we could have afforded to do 10G at all if we'd had to use SFP+ modules. These teething problems are mild by comparison and I have no reason to think that they won't get better over time.

(But if you gave me buckets of money, well, I think that an all SFP+ solution is going to be more reliable today if you can afford it. And it clearly dissipates less power at the moment.)

tech/Finicky10GT written at 01:27:28; Add Comment


My (somewhat silly) SSD dilemma

The world has reached the point where I want to move my home machine from using spinning rust to using SSDs; in fact it's starting to reach the point where sticking on spinning rust seems dowdy and decidedly behind the times. I certainly would like extremely fast IO and no seek overheads and so on, especially when I do crazy things like rebuild Firefox from source on a regular basis. Unfortunately I have a dilemma because of a combination of three things:

  • I insist on mirrored disks for anything I value, for good reason.

  • I want the OS on different physical disks than my data because that makes it much easier to do drastic things like full OS reinstalls (my current system is set up this way).

  • SSDs are not big enough for me to fit all of my data on one SSD (and a bunch of my data is stuff that doesn't need SSD speeds but does need to stay online for good long-term reasons).

(As a hobbyist photographer who shoots in RAW format, the last is probably going to be the case for a long time to come. Photography is capable of eating disk space like popcorn, and it gets worse if you're tempted to get into video too.)

Even if I was willing to accept a non-mirrored system disk (which I'm reluctant to do), satisfying all of this in one machine requires five drives (three SSDs plus two HDs). Six drives would be better. That's a lot of drives to put in one one case and to connect to one motherboard (especially given that an optical drive will require a SATA port these days and yes, I probably still want one).

(I think that relatively small SSDs are now cheap enough that I'd put the OS on SSDs for both speed and lower power. This is contrary to my view two years ago, but times change.)

There are various ways to make all of this fit, such as pushing the optical drive off to an external USB drive and giving up on separate system disk(s), but a good part of my dilemma is that I don't really like any of them. In part it feels like I'm trying to force a system design that is not actually ready yet and what I should be doing is, say, waiting for SSD capacities to go up another factor of two and the prices to drop a bunch more.

(I also suspect that we're going to see more and more mid-tower cases that are primarily designed for 2.5" SSDs, although casual checking suggests that one can get cases that will take a bunch of them even as it stands.)

In short: however tempting SSDs seem, right now it seems like we're in the middle of an incomplete technology transition. However much I'd vaguely like some, I'm probably better off waiting for another year or two or three. How fortunate that this matches my general lethargy about hardware purchases (although there's what happened with my last computer upgrade to make me wonder).

(My impression is that we're actually in the middle of several PC technology transitions. 4K monitors and 'retina' displays seem like another current one, for example, one that I'm quite looking forward to.)

tech/MySSDDilemma written at 23:51:13; Add Comment


A data point on how rapidly spammers pick up addresses from the web

On June 15, what is almost exactly a month ago now, I wrote an entry on a weird non-relaying relay attempt I saw. In the entry I quoted a SMTP conversation, including a local address handled by my sinkhole SMTP server. As I was writing the entry I decided to change the local part of the address to an obviously bogus 'XXXX' and then see if spammers picked up that address and started trying to deliver things to that new address.

I am now able to report that it took less than a month. On July 11th I saw the first delivery attempt; July 14th saw the second and third ones. The first and the third 'succeeded' in getting all the way to a DATA submission (which was 5xx'd but had the message captured for my inspection). The resulting spam is a little bit interesting.

The first spam message looks like a serious attempt by what seems like a Chinese-affiliated spam gang to sell me some e-mail address databases, based on what geographic area I wanted to target, and maybe hawk their spamming services too. It uses a forged envelope sender and comes from a US hosting/cloud provider, with replies directed to 163.com and a image in its HTML being fetched from a tagged URL on a Chinese IP address.

The second spam message (from the third delivery attempt) comes from what is probably a compromised mail server in the UK. It is plain and straightforward advance fee fraud, and not a particularly sophisticated one; apart from the destination address there is absolutely nothing unusual about it. It was probably ultimately sent from Malaysia, perhaps from a compromised machine of some sort (the likely source IP is currently in the CBL).

(The second delivery attempt had sufficiently many signs of being ordinary advance fee fraud that my sinkhole SMTP server rejected it before DATA. Now that I look it comes from an IP address in the same /24 as the first delivery attempt; it got rejected early because the envelope sender address claimed to be from qq.com. I've switched my sinkhole SMTP server to early rejection of stuff that's likely to be boring spam because I've already collected enough samples of it. Maybe someday I'll change my mind and do a completely raw 'one week in spam', but not right now.)

There is an obvious theory about what happened with my address here: scraped by a spammer who briefly attempted to market services to me and then started selling the address and/or their spamming services to other spammers. I can't know if this story is right, of course. I may learn more if more spam arrives for that address.

(And if no more spam arrives for the address I'll also learn something. At this point I do expect it to get more spam, though, since it's in the hands of advance fee fraud spammers.)

spam/SpammerAddressPickup written at 23:29:48; Add Comment


Unmounting recoverable stale NFS mounts on Linux

Suppose that you have NFS mounts go stale on your Linux clients by accident; perhaps you have disabled sharing of some filesystem on the fileserver without quite unmounting it on all the clients first. Now you try to unmount them on the clients and you get the cheerful error message:

# umount /cs/dtr
/cs/dtr was not found in /proc/mounts

You have two problems here. The first problem is in umount.nfs and it is producing the error message you see here. This error message happens because at least some versions of the kernel helpfully change the /proc/mounts output for a NFS mount that they have detected as stale. Instead of the normal output, you get:

fs8:/cs/4/dtr /cs/dtr\040(deleted) nfs rw,nosuid,....

(Instead of plain '/cs/dtr' as the mount point.)

This of course does not match what is in /etc/mtab and umount.nfs errors out with the above error message. As far as I can tell from our brief experience with this today, there is no way to cause the kernel to reverse its addition of this '\040(deleted)' marker. You can make the filesystem non-stale (by re-exporting it or the like), have IO to the NFS mount on the client work fine, and the kernel will still keep it there. You are screwed. To get around this you need to build a patched version of nfs-utils (see also). You want to modify utils/mount/nfsumount.c; search for the error message to find where.

(Note that compiling nfs-utils only gets you a mount.nfs binary. This is actually the same program as umount.nfs; it check its name when invoked to decide what to do, so you need to get it invoked under the right name in some way.)

Unfortunately you're not out of the woods because as far as I can tell many versions of the Linux kernel absolutely refuse to let you unmount a stale NFS mountpoint. The actual umount() system calls will fail with ESTALE even when you can persuade or patch umount.nfs to make them. As far as I know the only way to recover from this is to somehow make the NFS mount non-stale; at this point a patched umount.nfs can make a umount() system call that will succeed. Otherwise you get to reboot the NFS client.

(I have tested and both the umount() and umount2() system calls will fail.)

The kernel side of this problem has apparently been fixed in 3.12 via this commit (found via here), so on really modern distributions such as Ubuntu 14.04 all you may need to do is build a patched umount.nfs. It is very much an issue on older ones such as Ubuntu 12.04 (and perhaps even CentOS 7, although maybe this fix got backported). In the mean time try not to let your NFS mounts become stale, or at least don't let the client kernels notice that they are stale.

(If an NFS mount is stale on the server but nothing on the client has noticed yet, you can still successfully unmount it without problems. But the first df or whatever that gets an ESTALE back from the server blows everything up.)

For additional information on this see eg Ubuntu bug 974374 or Fedora bug 980088 or this linux-nfs mailing list message and thread.

linux/NFSStaleUnmounting written at 22:26:37; Add Comment

(Previous 10 or go back to July 2014 at 2014/07/13)

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.