Wandering Thoughts archives

2014-07-25

The OmniOS version of SSH is kind of slow for bulk transfers

If you look at the manpage and so on, it's sort of obvious that the Illumos and thus OmniOS version of SSH is rather behind the times; Sun branched from OpenSSH years ago to add some features they felt were important and it has not really been resynchronized since then. It (and before it the Solaris version) also has transfer speeds that are kind of slow due to the SSH cipher et al overhead. I tested this years ago (I believe close to the beginning of our ZFS fileservers), but today I wound up retesting it to see if anything had changed from the relatively early days of Solaris 10.

My simple tests today were on essentially identical hardware (our new fileserver hardware) running OmniOS r151010j and CentOS 7. Because I was doing loopback tests with the server itself for simplicity, I had to restrict my OmniOS tests to the ciphers that the OmniOS SSH server is configured to accept by default; at the moment that is aes128-ctr, aes192-ctr, aes256-ctr, arcfour128, arcfour256, and arcfour. Out of this list, the AES ciphers run from 42 MBytes/sec down to 32 MBytes/sec while the arcfour ciphers mostly run around 126 MBytes/sec (with hmac-md5) to 130 Mbytes/sec (with hmac-sha1).

(OmniOS unfortunately doesn't have any of the umac-* MACs that I found to be significantly faster.)

This is actually an important result because aes128-ctr is the default cipher for clients on OmniOS. In other words, the default SSH setup on OmniOS is about a third of the speed that it could be. This could be very important if you're planning to do bulk data transfers over SSH (perhaps to migrate ZFS filesystems from old fileservers to new ones)

The good news is that this is faster than 1G Ethernet; the bad news is that this is not very impressive compared to what Linux can get on the same hardware. We can make two comparisons here to show how slow OmniOS is compared to Linux. First, on Linux the best result on the OmniOS ciphers and MACs is aes128-ctr with hmac-sha1 at 180 Mbytes/sec (aes128-ctr with hmac-md5 is around 175 MBytes/sec), and even the arcfour ciphers run about 5 Mbytes/sec faster than on OmniOS. If we open this up to the more extensive set of Linux ciphers and MACs, the champion is aes128-ctr with umac-64-etm at around 335 MBytes/sec and all of the aes GCM variants come in with impressive performances of 250 Mbytes/sec and up (umac-64-etm improves things a bit here but not as much as it does for aes128-ctr).

(I believe that one reason Linux is much faster on the AES ciphers is that the version of OpenSSH that Linux uses has tuned assembly for AES and possibly uses Intel's AES instructions.)

In summary, through a combination of missing optimizations and missing ciphers and MACs, OmniOS's normal version of OpenSSH is leaving more than half the performance it could be getting on the table.

(The 'good' news for us is that we are doing all transfers from our old fileservers over 1G Ethernet, so OmniOS's ssh speeds are not going to be the limiting factor. The bad news is that our old fileservers have significantly slower CPUs and as a result max out at about 55 Mbytes/sec with arcfour (and interestingly, hmac-md5 is better than hmac-sha1 on them).)

PS: If I thought that network performance was more of a limit than disk performance for our ZFS transfers from old fileservers to the new ones, I would investigate shuffling the data across the network without using SSH. I currently haven't seen any sign that this is the case; our 'zfs send | zfs recv' runs have all been slower than this. Still, it's an option that I may experiment with (and who knows, a slow network transfer may have been having knock-on effects).

OmniOSSshIsSlow written at 01:36:48; Add Comment

2014-07-02

Why Solaris's SMF is not a good init system

An init system has two jobs: running and supervising services, and managing what it runs and supervises. SMF is a perfectly good init system as far as the former goes, and better than some (eg the traditional System V system). It is the second job where SMF falls down terribly because it's decidedly complex, opaque, fragile, and hard to manipulate. The result is a very divided experience where as long as you don't have to do anything to SMF it's a fine init system but the moment you do everything becomes immensely frustrating.

Here is an illustration of how complex, opaque, and fragile SMF is. The following is a script of commands that must be fed to svccfg as one block of actions in order to make two changes to our OmniOS systems: to start syseventd only after filesystem/local (it normally starts earlier), and to start ssh after filesystem/minimal (ie very early in the boot process, so if things go wrong we have system access).

select svc:/system/filesystem/local
delpg npiv-filesystem
select svc:/milestone/devices
delpg devices
select svc:/milestone/single-user
delpg syseventd_single-user
select svc:/system/sysevent:default
addpg filesystems-dep dependency
setprop filesystems-dep/grouping = astring: "require_all"
setprop filesystems-dep/restart_on = astring: "none"
setprop filesystems-dep/type = astring: "service"
setprop filesystems-dep/entities = fmri: "svc:/system/filesystem/local:default"
select svc:/network/ssh
setprop fs-local/entities = fmri: "svc:/system/filesystem/minimal"
delpg fs-autofs
end

There are two obvious things about this sequence of commands, namely that there are quite a lot of them and they are probably pretty opaque if you're not familiar with SMF. But there are several other things that are worthy of mention. The first is that it is actually fairly difficult to discover and work out what these commands should be and need to be; I had multiple false steps and missteps during the process. Many of the names involved in this process are arbitrary, ie up to the individual services to decide on and as you can see many of the services have chosen different names. These names are of course not documented and thus presumably not officially supported.

(Nor do the OmniOS SMF manpages discuss how your changes interact with, say, applying a package update for the packages you've manipulated.)

The next inobvious thing is that if you get some of these wrong, SMF will thoroughly blow up your system by, for example, detecting a dependency cycle and then refusing to have any part of it instead of trying some sort of fallback cycle-breaking in order to allow your system to boot to some extent. Nor does SMF prevent you from creating a dependency cycle by (for example) refusing to commit a service change that would set up such a cycle; instead it just tells you that you've made one somehow. This is why I call managing SMF a fragile thing.

Oh, and the third inobvious thing is that there are several ways to do what I've done above, all of them probably roughly equivalent. At least one thing I've done is a convenient hack instead of what would be the theoretically 'correct' way; I've done the hack because the theoretically correct way is too much of a pain in the rear. That by itself is a glaring problem indicator, as doing the correct thing should be the easiest approach.

(The hack is that instead of deleting ssh's property group for its dependency on filesystems/local and creating a new property group for a new dependency on filesystem/minimal, I have instead rewritten the specific service that 'fs-local' depends on and thus its name is now kind of a lie. But this change is one line instead of six lines, making it an attractive hack.)

To be manageable, an init system needs to be clear, well documented, and easy to use. You should be able to easily discover what properties a service has, what properties it can have, how these affect its operation, and so on. It should be obvious or at least well documented how to change service start order (because this is a not uncommon need). For dependency-based init systems without a strict ordering, it should be easy to discover what depends on what (including transitively) and either impossible or as harmless as possible to create dependency cycles. It should not require a major obscure bureaucracy to change things, nor hunts through the Internet and Stack Overflow to work out how to do things.

SMF is not a success at any of these, especially being easy to use (about the only thing that is simple in SMF is simply disabling and enabling services). That is why I say that it is not a good init system. If I had to describe it in a nutshell, I would say that SMF is a perfect illustration of what Fred Brooks calls the second system effect. People at Sun clearly wanted to make a better init system that fixed all of the problems people had ever had with System V init, but what they put together is utterly over-engineered and complex and opaque.

(I also have to mention that SMF falls down badly on the small matter of managing serial port logins. Doing this in SMF is so complicated that no one I've talked to can tell me how to successfully enable logins on a particular serial port. Really. This is yet another sign that something is terribly wrong in how SMF is configured and manipulated, even if it's perfectly fine at starting and restarting services once you can configure them.)

SMFNotGoodInitSystem written at 00:45:28; Add Comment

By day for July 2014: 2 25; before July; after July.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.