2015-03-30
My preliminary views on mosh
Mosh is sort of a more reliable take on ssh that supports network disconnections, roaming, and other interruptions. I've heard about it for a while and recently Paul Tötterman asked me what I thought about it in a comment on my entry on SSH connection sharing and network stalls. The short version is that so far I haven't been interested in it for a collection of reasons, which I'm going to try to run down in the honest order.
First off, mosh solves a problem that I basically don't have. Mosh sounds great if I was trying to SSH in to our servers from a roaming, periodically suspended laptop, or facing a terribly unreliable network, or just dealing with significant network latency. But I'm not; essentially all of my use of ssh is from constantly connected static machines with fixed IP addresses and good to excellent networking to the targets of my ssh'ing.
Next, using mosh instead of ssh is an extra step. Mosh is not
natively installed on essentially anything I use, either clients
or especially servers. That means that before I can even think of
using mosh, I need to install some software. Having to install
software is a pain, especially for more exotic environments and
places where I don't have root
. If mosh solved a real problem for
me it would be worth overcoming this, but since it doesn't, I don't
feel very motived to go to this extra work.
(In the jargon, you'd say that mosh doesn't fix a pain point.)
Then there's the problem that mosh doesn't support critical SSH features that I use routinely. At work I do a lot with X11 forwarding while at home I rely on ssh agent forwarding to one machine. This narrows mosh's utility significantly in either environment, so I could only use it with selected machines instead of using it relatively pervasively. Narrow usage is another disincentive to use as it both lowers even the potential return from using mosh and increases the amount of work involved (since I can't use mosh pervasively but have to switch back and forth somehow). There are some hand-waving coping measures that could reduce the pain here.
Finally, down at the bottom (despite what I wrote in my reply comment) is that I have much less trust in the security of mosh's connection than I do in the security of SSH connections. Mosh may be secure but as the people behind it admit in their FAQ, it hasn't been subject to the kind of scrutiny that OpenSSH and the SSH v2 protocol have had. SSH has had longer scrutiny and almost certainly far more scrutiny, just because of all of the rewards of breaking OpenSSH somewhere.
If I'm being honest, nervousness about mosh's security wouldn't stop me from using it if it solved a problem for me. Since it doesn't, this nervousness is yet another reason to avoid mosh on general principles.
(It may surprise people to hear this but I'm generally quite conservative and lazy in my choice of tools. I tend not to experiment with things very often and it usually (although not always) takes a lot of work to get me to give something a try. Sometimes this is a bad thing because I quietly cling to what turns out to be an inferior alternative just because I'm used to it.)
The 'cattle' model for servers is only a good fit in certain situations
To start with, let me define my terms. When I talk about 'cattle' servers, my primary definition is expendable servers that you don't need to care about when something goes wrong. A server is cattle if you can terminate it and then start a new one and be fine. A server is a pet if you actually care about it in specific staying alive.
My contention is that to have cattle servers, you either need to have a certain service delivery model or be prepared to spend a lot of money on redundancy and (HA) failover. This follows from the obvious consequence of the cattle model: in order to have a cattle model at all, people can't care what specific server they are currently getting service from. The most extreme example of not having this is when people ssh in to login or compute servers and run random commands on them; in such an environment, people care very much if their specific server goes down all of a sudden.
One way to get this server independence is to have services that can be supplied generically. For example, web pages can be delivered this way (given load balancers and so on), and it's often easy to do so. A lot of work has gone into creating backend architectures that can also be used this way (often under the goal of horizontal scalability), with multiple redundant database servers (for example) and clients that distribute DB lookups around a cluster. Large scale environments are often driven to this approach because they have no choice.
The other way to get server independence is to take what would normally be a server-dependent thing, such as NFS fileservice, and apply enough magic (via redundancy, failover, front end load balancer distribution, and so on) to turn it into something that can be supplied generically from multiple machines. In the case of NFS fileservers, instead of having a single NFS server you would create an environment with a SAN, multiple fileservers, virtual IP addresses, and transparent failover (possibly fast enough to count as 'high availability'). Sometimes this can be done genuinely transparently; sometimes this requires clients to be willing to reconnect and resume work when their existing connection is terminated (IMAP clients will generally do this, for example, so you can run them through a load balancer to a cluster of IMAP servers with shared backend storage).
(These categories somewhat overlap, of course. You usually get generic services by doing some amount of magic work to what initially were server-dependent things.)
If you only have to supply generic services or you have the money to turn server-dependent services into generic ones, the cattle model is a good fit. But if you don't, if you have less money and few or no generic services, then the cattle model is never going to fit your operations particularly well. You may well have an automated server setup and management system, but when one fileserver or login server starts being flaky the answer is probably not going to be 'terminate it and start a new instance'. In this case, you're probably going to want to invest much more in diagnostics and so on than someone in the cattle world.
(This 'no generic services' situation is pretty much our situation.)