Wandering Thoughts

2014-07-27

Go is still a young language

Once upon a time, young languages showed their youth by having core incapabilities (important features not implemented, important platforms not supported, or the like). This is no longer really the case today; now languages generally show their youth through limitations in their standard library. The reality is that a standard library that deals with the world of the modern Internet is both a lot of work and the expression of a lot of (painful) experience with corner cases, how specifications work out in practice, and so on. This means that such a library takes time, time to write everything and then time to find all of the corner cases. When (and while) the language is young, its standard library will inevitably have omissions, partial implementations, and rough corners.

Go is a young language. Go 1.0 was only released two years ago, which is not really much time as these things go. It's unsurprising that even today portions of the standard library are under active development (I mostly notice the net packages because that's what I primarily use) and keep gaining additional important features in successive Go releases.

Because I've come around to this view, I now mostly don't get irritated when I run across deficiencies in the corners of Go's standard packages. Such deficiencies are the inevitable consequence of using a young language, and while they're obvious to me that's because I'm immersed in the particular area that exposes them. I can't expect authors of standard libraries to know everything or to put their package to the same use that I am time. And time will cure most injuries here.

(Sometimes the omissions are deliberate and done for good reason, or so I've read. I'm not going to cite my primary example yet until I've done some more research about its state.)

This does mean that development in Go can sometimes require a certain sort of self-sufficiency and willingness to either go diving into the source of standard packages or deliberately find packages that duplicate the functionality you need but without the limitations you're running into. Some times this may mean duplicating some amount of functionality yourself, even if it seems annoying to have to do it at the time.

(Not mentioning specific issues in, say, the net packages is entirely deliberate. This entry is a general thought, not a gripe session. In fact I've deliberately written this entry as a note to myself instead of writing another irritated grump, because the world does not particularly need another irritated grump about an obscure corner of any standard Go package.)

programming/GoYoungLanguage written at 23:06:19; Add Comment

Save your test scripts and other test materials

Back in 2009 I tested ssh cipher speeds (although it later turned out to be somewhat incomplete. Recently I redid those tests on OmniOS, with some interesting results. I was able to do this (and do it easily) because I originally did something that I don't do often enough: I saved the script I used to run the tests for my original entry. I didn't save full information, though; I didn't save information on exactly how I ran it (and there's several options). I can guess a bit but I can't be completely sure.

I should do this more often. Saving test scripts and test material has two uses. First, you can go back later and repeat the tests in new environments and so on. This is not just an issue of getting comparison data, it's also an issue of getting interesting data. If the tests were interesting enough to run once in one environment they're probably going to be interesting in another environment later. Making it easy or trivial to test the new environment makes it more likely that you will. Would I have bothered to do these SSH speed tests on OmniOS and CentOS 7 if I hadn't had my test script sitting around? Probably not, and that means I'd have missed learning several things.

The second use is that saving all of this test information means that you can go back to your old test results with a lot more understanding of what they mean. It's one thing to know that I got network speeds of X Mbytes/sec between two systems, but there are a lot of potential variables in that simple number. Recording the details will give me (and other people) as many of those variables as possible later on, which means we'll understand a lot more about what the simple raw number means. One obvious aspect of this understanding is being able to fully compare a number today with a number from the past.

(This is an aspect of scripts capturing knowledge, of course. But note that test scripts by themselves don't necessarily tell you all the details unless you put a lot of 'how we ran this' documentation into their comments. This is probably a good idea, since it captures all of this stuff in one place.)

sysadmin/SaveYourTests written at 00:49:23; Add Comment

2014-07-25

An interesting picky difference between Bourne shells

Today we ran into an interesting bug in one of our internal shell scripts. The script had worked for years on our Solaris 10 machines, but on a new OmniOS fileserver it suddenly reported an error:

script[77]: [: 232G: arithmetic syntax error

Cognoscenti of ksh error messages have probably already recognized this one and can tell me the exact problem. To show it to everyone else, here is line 77:

if [ "$qsize" -eq "none" ]; then
   ....

In a strict POSIX shell, this is an error because test's -eq operator is specifically for comparing numbers, not strings. What we wanted is the = operator.

What makes this error more interesting is that the script had been running for some time on the OmniOS fileserver without this error. However, until now the $qsize variable had always had the value 'none'. So why hadn't it failed earlier? After all, 'none' (on either side of the expression) is just as much of not-a-number as '232G' is.

The answer is that this is a picky difference between shells in terms of how they actually behave. Bash, for example, always complains about such misuse of -eq; if either side is not a number you get an error saying 'integer expression expected' (as does Dash, with a slightly different error). But on our OmniOS, /bin/sh is actually ksh93 and ksh93 has a slightly different behavior. Here:

$ [ "none" -eq "none" ] && echo yes
yes
$ [ "bogus" -eq "none" ] && echo yes
yes
$ [ "none" -eq 0 ] && echo yes
yes
$ [ "none" -eq "232G" ] && echo yes
/bin/sh: [: 232G: arithmetic syntax error

The OmniOS version of ksh93 clearly has some sort of heuristic about number conversions such that strings with no numbers are silently interpreted as '0'. Only invalid numbers (as opposed to things that aren't numbers at all) produce the 'arithmetic syntax error' message. Bash and dash are both more straightforward about things (as is the FreeBSD /bin/sh, which is derived from ash).

Update: my description isn't actually what ksh93 is doing here; per opk's comment, it's actually interpreting the none and bogus as variable names and giving them a value of 0 when unset.

Interestingly, the old Solaris 10 /bin/sh seems to basically be calling atoi() on the arguments for -eq; the first three examples work the same, the fourth is silently false, and '[ 232 -eq 232G ]' is true. This matches the 'let's just do it' simple philosophy of the original Bourne shell and test program and may be authentic original V7 behavior.

(Technically this is a difference in test behavior, but test is a builtin in basically all Bourne shells these days. Sometimes the standalone test program in /bin or /usr/bin is actually a shell script to invoke the builtin.)

unix/ShTestDifference written at 23:34:18; Add Comment

The OmniOS version of SSH is kind of slow for bulk transfers

If you look at the manpage and so on, it's sort of obvious that the Illumos and thus OmniOS version of SSH is rather behind the times; Sun branched from OpenSSH years ago to add some features they felt were important and it has not really been resynchronized since then. It (and before it the Solaris version) also has transfer speeds that are kind of slow due to the SSH cipher et al overhead. I tested this years ago (I believe close to the beginning of our ZFS fileservers), but today I wound up retesting it to see if anything had changed from the relatively early days of Solaris 10.

My simple tests today were on essentially identical hardware (our new fileserver hardware) running OmniOS r151010j and CentOS 7. Because I was doing loopback tests with the server itself for simplicity, I had to restrict my OmniOS tests to the ciphers that the OmniOS SSH server is configured to accept by default; at the moment that is aes128-ctr, aes192-ctr, aes256-ctr, arcfour128, arcfour256, and arcfour. Out of this list, the AES ciphers run from 42 MBytes/sec down to 32 MBytes/sec while the arcfour ciphers mostly run around 126 MBytes/sec (with hmac-md5) to 130 Mbytes/sec (with hmac-sha1).

(OmniOS unfortunately doesn't have any of the umac-* MACs that I found to be significantly faster.)

This is actually an important result because aes128-ctr is the default cipher for clients on OmniOS. In other words, the default SSH setup on OmniOS is about a third of the speed that it could be. This could be very important if you're planning to do bulk data transfers over SSH (perhaps to migrate ZFS filesystems from old fileservers to new ones)

The good news is that this is faster than 1G Ethernet; the bad news is that this is not very impressive compared to what Linux can get on the same hardware. We can make two comparisons here to show how slow OmniOS is compared to Linux. First, on Linux the best result on the OmniOS ciphers and MACs is aes128-ctr with hmac-sha1 at 180 Mbytes/sec (aes128-ctr with hmac-md5 is around 175 MBytes/sec), and even the arcfour ciphers run about 5 Mbytes/sec faster than on OmniOS. If we open this up to the more extensive set of Linux ciphers and MACs, the champion is aes128-ctr with umac-64-etm at around 335 MBytes/sec and all of the aes GCM variants come in with impressive performances of 250 Mbytes/sec and up (umac-64-etm improves things a bit here but not as much as it does for aes128-ctr).

(I believe that one reason Linux is much faster on the AES ciphers is that the version of OpenSSH that Linux uses has tuned assembly for AES and possibly uses Intel's AES instructions.)

In summary, through a combination of missing optimizations and missing ciphers and MACs, OmniOS's normal version of OpenSSH is leaving more than half the performance it could be getting on the table.

(The 'good' news for us is that we are doing all transfers from our old fileservers over 1G Ethernet, so OmniOS's ssh speeds are not going to be the limiting factor. The bad news is that our old fileservers have significantly slower CPUs and as a result max out at about 55 Mbytes/sec with arcfour (and interestingly, hmac-md5 is better than hmac-sha1 on them).)

PS: If I thought that network performance was more of a limit than disk performance for our ZFS transfers from old fileservers to the new ones, I would investigate shuffling the data across the network without using SSH. I currently haven't seen any sign that this is the case; our 'zfs send | zfs recv' runs have all been slower than this. Still, it's an option that I may experiment with (and who knows, a slow network transfer may have been having knock-on effects).

solaris/OmniOSSshIsSlow written at 01:36:48; Add Comment

2014-07-23

What influences SSH's bulk transfer speeds

A number of years ago I wrote How fast various ssh ciphers are because I was curious about just how fast you could do bulk SSH transfers and how to get them to go fast under various circumstances. Since then I have learned somewhat more about SSH speed and what controls what things you have available and can get.

To start with, my years ago entry was naively incomplete because SSH encryption has two components: it has both a cipher and a cryptographic hash used as the MAC. The choice of both of them can matter, especially if you're willing to deliberately weaken the MAC. As an example of how much of an impact this might make, in my testing on a Linux machine I could almost double SSH bandwidth by switching from the default MAC to 'umac-64-etm@openssh.com'.

(At the same time, no other MAC choice made much of a difference within a particular cipher, although hmac-sha1 was sometimes a bit faster than hmac-md5.)

Clients set the cipher list with -c and the MAC with -m, or with the Ciphers and MACs options in your SSH configuration file (either a personal one or a global one). However, what the client wants to use has to be both supported by the server and accepted by it; this is set in the server's Ciphers and MACs configuration options. The manpages for ssh_config and sshd_config on your system will hopefully document both what your system supports at all and what it's set to accept by default. Note that this is not necessarily the same thing; I've seen systems where sshd knows about ciphers that it will not accept by default.

(Some modern versions of OpenSSH also report this information through 'ssh -Q <option>'; see the ssh manpage for details. Note that such lists are not necessarily reported in preference order.)

At least some SSH clients will tell you what the server's list of acceptable ciphers (and MACs) if you tell the client to use options that the server doesn't support. If you wanted to, I suspect that you could write a program in some language with SSH protocol libraries that dumped all of this information for you for an arbitrary server (without the fuss of having to find a cipher and MAC that your client knew about but your server didn't accept).

Running 'ssh -v' will report the negotiated cipher and MAC that are being used for the connection. Technically there are two sets of them, one for the client to server and one for the server back to the client, but I believe that under all but really exceptional situations you'll use the same cipher and MAC in both directions.

Different Unix OSes may differ significantly in their support for both ciphers and MACs. In particular Solaris effectively forked a relatively old version of OpenSSH and so modern versions of Illumos (and Illumos distributions such as OmniOS) do not offer you anywhere near a modern list of choices here. How recent your distribution is will also matter; our Ubuntu 14.04 machines naturally offer us a lot more choice than our Ubuntu 10.04 ones.

PS: helpfully the latest OpenSSH manpages are online (cf), so the current manpage for ssh_config will tell you the latest set of ciphers and MACs supported by the official OpenSSH and also show the current preference order. To my interest it appears that OpenSSH now defaults to the very fast umac-64-etm MAC.

sysadmin/SshBulkSpeed written at 23:21:12; Add Comment

One of SELinux's important limits

People occasionally push SELinux as the cure for security problems and look down on people who routinely disable it (as we do). I have some previously expressed views on this general attitude, but what I feel like pointing out today is that SELinux's security has some important intrinsic limits. One big one is that SELinux only acts at process boundaries.

By its nature, SELinux exists to stop a process (or a collection of them) from doing 'bad things' to the rest of the system and to the outside environment. But there are any number of dangerous exploits that do not cross a process's boundaries this way; the most infamous recent one is Heartbleed. SELinux can do nothing to stop these exploits because they happen entirely inside the process, in spheres fully outside its domain. SELinux can only act if the exploit seeks to exfiltrate data (or influence the outside world) through some new channel that the process does not normally use, and in many cases the exploit doesn't need to do that (and often doesn't bother).

Or in short, SELinux cannot stop your web server or your web browser from getting compromised, only from doing new stuff afterwards. Sending all of the secrets that your browser or server already has access to to someone in the outside world? There's nothing SELinux can do about that (assuming that the attacker is competent). This is a large and damaging territory that SELinux doesn't help with.

(Yes, yes, privilege separation. There are a number of ways in which this is the mathematical security answer instead of the real one, including that most network related programs today are not privilege separated. Chrome exploits also have demonstrated that privilege separation is very hard to make leak-proof.)

linux/SELinuxProgramBoundaries written at 00:24:23; Add Comment

2014-07-21

What I know about the different types of SSH keys (and some opinions)

Modern versions of SSH support up to four different types of SSH keys (both for host keys to identify servers and for personal keys): RSA, DSA, ECDSA, and as of OpenSSH 6.5 we have ED25519 keys as well. Both ECDSA and ED25519 uses elliptic curve cryptography, DSA uses finite fields, and RSA is based on integer factorization. EC cryptography is said to have a number of advantages, particularly in that it uses smaller key sizes (and thus needs smaller exchanges on the wire to pass public keys back and forth).

(One discussion of this is this cloudflare blog post.)

RSA and DSA keys are supported by all SSH implementations (well, all SSH v2 implementations which is in practice 'all implementations' these days). ECDSA keys are supported primarily by reasonably recent versions of OpenSSH (from OpenSSH 5.7 onwards); they may not be in other versions, such as the SSH that you find on Solaris and OmniOS or on a Red Hat Enterprise 5 machine. ED25519 is only supported in OpenSSH 6.5 and later, which right now is very recent; of our main machines, only the Ubuntu 14.04 ones have it (especially note that it's not supported by the RHEL 7/CentOS 7 version of OpenSSH).

(I think ED25519 is also supported on Debian test (but not stable) and on up to date current FreeBSD and OpenBSD versions.)

SSH servers can offer multiple host keys in different key types (this is controlled by what HostKey files you have configured). The order that OpenSSH clients will try host keys in is controlled by two things: the setting of HostKeyAlgorithms (see 'man ssh_config' for the default) and what host keys are already known for the target host. If no host keys are known, I believe that the current order is order is ECDSA, ED25519, RSA, and then DSA; once there are known keys, they're tried first. What this really means is that for an otherwise unknown host you will be prompted to save the first of these key types that the host has and thereafter the host will be verified against it. If you already know an 'inferior' key (eg a RSA key when the host also advertises an ECDSA key), you will verify the host against the key you know and, as far as I can tell, not even save its 'better' key in .ssh/known_hosts.

(If you have a mixture of SSH client versions, people can wind up with a real mixture of your server key types in their known_hosts files or equivalent. This may mean that you need to preserve and restore multiple types of SSH host keys over server reinstalls, and probably add saving and restoring ED25519 keys when you start adding Ubuntu 14.04 servers to your mix.)

In terms of which key type is 'better', some people distrust ECDSA because the elliptic curve parameters are magic numbers from NIST and so could have secret backdoors, as appears to be all but certain for another NIST elliptic curve based cryptography standard (see also and also and more). I reflexively dislike both DSA and ECDSA because DSA implementation mistakes can be explosively fatal, as in 'trivially disclose your private keys'. While ED25519 also uses DSA it takes specific steps to avoid at least some of the explosive failures of plain DSA and ECDSA, failures that have led to eg the compromise of Sony's Playstation 3 signing keys.

(RFC 6979 discusses how to avoid this particular problem for DSA and ECDSA but it's not clear to me if OpenSSH implements it. I would assume not until explicitly stated otherwise.)

As a result of all of this I believe that the conservative choice is to advertise and use only RSA keys (both host keys and personal keys) with good bit sizes. The slightly daring choice is to use ED25519 when you have it available. I would not use either ECDSA or DSA although I wouldn't go out of my way to disable server ECDSA or DSA host keys except in a very high security environment.

(I call ED25519 'slightly daring' only because I don't believe it's undergone as much outside scrutiny as RSA, and I could be wrong about that. See here and here for a discussion of ECC and ED25519 security properties and general security issues. ED25519 is part of Dan Bernstein's work and in general he has a pretty good reputation on these issues. Certainly the OpenSSH people were willing to adopt it.)

PS: If you want to have your eyebrows raised about the choice of elliptic curve parameters, see here.

PPS: I don't know what types of keys non-Unix SSH clients support over and above basic RSA and DSA support. Some casual Internet searches suggest that PuTTY doesn't support ECDSA yet, for example. And even some Unix software may have difficulties; for example, currently GNOME Keyring apparently doesn't support ECDSA keys (via archlinux).

sysadmin/SSHKeyTypes written at 23:12:48; Add Comment

2014-07-20

The CBL has a real false positive problem

As I write this, a number of IP addresses in 128.100.1.0/24 are listed in the CBL, and various of them have been listed for some time. There is a problem with this: these CBL-listed IP addresses don't exist. I don't mean 'they aren't supposed to exist'; I mean 'they could only theoretically exist on a secure subnet in our machine room and even if they did exist our firewall wouldn't allow them to pass traffic'. So these IP addresses don't exist in a very strong sense. Yet the CBL lists them and has for some time.

The first false positive problem the CBL has is that they are listing this traffic at all. We have corresponded with the CBL about this and these listings (along with listings on other of our subnets) all come from traffic observed at a single one of their monitoring points. Unlike what I assumed in the past, these observations are not coming from parsing Received: headers but from real TCP traffic. However they are not connections from our network, and the university is the legitimate owner and router of 128.100/16. A CBL observation point that is using false routing (and is clearly using false routing over a significant period of time) is an actively dangerous thing; as we can see here, false routing can cause the CBL to list anything.

The second false positive problem the CBL has is that, as mentioned, we have corresponded with the CBL over this. In that correspondence the CBL spokesperson agreed that the CBL was incorrect in this listing and would get it fixed. That was a couple of months ago, yet a revolving cast of 128.100.1.0/24 IP addresses still gets listed and relisted in the CBL. As a corollary of this, we can be confident that the CBL listening point(s) involved are still using false routes for some of their traffic. You can apply charitable or less charitable assumptions for this lack of actual action on the CBL's part; at a minimum it is clear that some acknowledged false positive problems go unfixed for whatever reason.

I don't particularly have a better option than the CBL these days. But I no longer trust it anywhere near as much as I used to and I don't particularly like its conduct here.

(And I feel like saying something about it so that other people can know and make their own decisions. And yes, the situation irritates me.)

(As mentioned, we've seen similar issues in the past, cf my original 2012 entry on the issue. This time around we've seen it on significantly more IP addresses, we have extremely strong confidence that it is a false positive problem, and most of all we've corresponded with the CBL people about it.)

spam/CBLFalsePositiveProblemII written at 23:03:23; Add Comment

HTTPS should remain genuinely optional on the web

I recently ran across Mozilla Bug 1041087 (via HN), which has the sort of harmless sound title of 'Switch generic icon to negative feedback for non-https sites'. Let me translate this to English: 'try to scare users if they're connecting to a non-https site'. For anyone who finds this attractive, let me say it flat out; this is a stupid idea on today's web.

(For the record, I don't think it's very likely that Mozilla will take this wishlist request seriously. I just think that there are people out there who wish that they would.)

I used to be very down on SSL Certificate Authorities, basically considering the whole thing a racket. It remains a racket but in today's environment of pervasive eavesdropping it is now a useful one; one might as well make the work of those eavesdroppers somewhat harder. I would be very enthusiastic for pervasive encryption if we could deploy that across the web.

Unfortunately we can't, exactly because of the SSL CA racket. Today having a SSL certificate means either scaring users and doing things that are terrible for security overall or being beholden to a SSL CA (and often although not always forking over money for this dubious privilege). Never mind the lack of true security due to the core SSL problem, this is not an attractive solution in general. Forcing mandatory HTTPS today means giving far too much power and influence to SSL CAs, often including the ability to turn off your website at their whim or mistake.

You might say that this proposal doesn't force mandatory HTTPS. That's disingenuous. Scaring users of a major browser when they visit a non-HTTPS site is effectively forcing HTTPS for the same reason that scary warnings about self-signed certificates force the use of official CA certificates. Very few websites can afford to scare users.

The time to force people towards HTTPS is when we've solved all of these problems. In other words, when absolutely any website can make itself a certificate and then securely advertise and use that certificate. We are nowhere near this ideal world in today's SSL CA environment (and we may or may not ever get there).

(By the way, I mean really mean any website here, including a hypothetical one run by anonymous people and hosted in some place either that no one likes or that generates a lot of fraud or both. There are a lot of proposals that basically work primarily for people in the West who are willing to be officially identified and can provide money; depart from this and you can find availability going downhill rapidly. Read up on the problems of genuine Nigerian entrepreneurs someday.)

web/HTTPSOptional written at 00:11:42; Add Comment

2014-07-19

Some consequences of widespread use of OCSP for HTTPS

OCSP is an attempt to solve some of the problems of certificate revocation. The simple version of how it works is that when your browser contacts a HTTPS website, it asks the issuer of the site's certificate if the certificate is both known and still valid. One important advantage of OCSP over CRLs that the CA now has an avenue to 'revoke' certificates that it doesn't know about. If the CA doesn't have a certificate in its database, it can assert 'unknown certificate' in reply to your query and the certificate doesn't work.

The straightforward implication of OCSP is that the CA knows that you're trying to talk to a particular website at a particular time. Often third parties can also know this, because OCSP queries may well be done over HTTP instead of HTTPS. OCSP stapling attempts to work around the privacy implications by having the website include a pre-signed, limited duration current attestation about their certificate from the CA, but it may not be widely supported.

(Website operators have to have software that supports OCSP stapling and specifically configure it. OCSP checking in general simply needs a field set in the certificate, which the CA generally forces on your SSL certificates if it supports OCSP.)

The less obvious implication of OCSP is that your CA can now turn off your HTTPS website any time it either wants to, is legally required to, or simply screws up something in its OCSP database. If your browser checks OCSP status and the OCSP server says 'I do not know this certificate', your browser is going to hard-fail the HTTPS connection. In fact it really has to, because this is exactly the response that it would get if the CA had been subverted into issuing an imposter certificate in some way that was off the books.

You may be saying 'a CA would never do this'. I regret to inform my readers that I've already seen this happen. The blunt fact is that keeping high volume services running is not trivial and systems suffer database glitches all the time. It's just that with OCSP someone else's service glitch can take down your website, my website, or in fact a whole lot of websites all at once.

As they say, this is not really a good way to run a railroad.

(See also Adam Langley on why revocation checking doesn't really help with security. This means that OCSP is both dangerous and significantly useless. Oh, and of course it often adds extra latency to your HTTPS connections since it needs to do extra requests to check the OCSP status.)

PS: note that OCSP stapling doesn't protect you from your CA here. It can protect you from temporary short-term glitches that fix themselves automatically (because you can just hold on to a valid OCSP response while the glitch fixes itself), but that's it. If the CA refuses to validate your certificate for long enough (either deliberately or through a long-term problem), your cached OCSP response expires and you're up the creek.

web/OCSPConsequences written at 00:22:06; Add Comment

(Previous 10 or go back to July 2014 at 2014/07/18)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.