2015-05-03
Sometimes knowing causes does you no good (and sensible uses of time)
Yesterday, I covered our OmniOS fileserver problem with overload and mentioned that the core problem seems to be (kernel) memory exhaustion. Of course once we'd identified this I immediately started coming up with lots of theories about what might be eating up all the memory (and then not giving it back), along with potential ways to test these theories. This is what sysadmins do when we're confronted with problems, after all; we try to understand them. And it can be peculiarly fun and satisfying to run down the root cause of something.
(For example, one theory is 'NFS TCP socket receive buffers', which would explain why it seems to need a bunch of clients all active.)
Then I asked myself an uncomfortable question: was this going to actually help us? Specifically, was it particularly likely to get us any closer to having OmniOS NFS fileservers that did not lock up under surges of too-high load? The more I thought about that, the more gloomy I felt, because the cold hard answer is that knowing the root cause here is unlikely to do us any good.
Some issues are ultimately due to simple and easily fixed bugs, or turn out to have simple configuration changes that avoid them. It seems unlikely that either are the case here; instead it seems much more likely to be a misdesigned or badly designed part of the Illumos NFS server code. Fixing bad designs is never a simple code change and they can rarely be avoided with configuration changes. Any fix is likely to be slow to appear and require significant work on someone's part.
This leads to the really uncomfortable realization that it is probably not worth spelunking this issue to explore and test any of these theories. Sure, it'd be nice to know the answer, but knowing the answer is not likely to get us much closer to a fix to a long-standing and deep issue. And what we need is that fix, not to know what the cause is, because ultimately we need fileservers that don't lock up every so often if things go a little bit wrong (because things go a little bit wrong on a regular basis).
This doesn't make me happy, because I like diagnosing problems and finding root causes (however much I gripe about it sometimes); it's neat and gives me a feeling of real accomplishment. But my job is not about feelings of accomplishment, it's about giving our users reliable fileservice, and it behooves me to spend my finite time on things that are most likely to result in that. Right now that does not appear to involve diving into OmniOS kernel internals or coming up with clever ways to test theories.
(If we had a lot of money to throw at people, perhaps the solution would be 'root cause the problem then pay Illumos people to do the kernel development needed to fix it'. But we don't have anywhere near that kind of money.)
2015-04-27
The fading out of tcpwrappers and its idea
Once upon a time, tcpwrappers were a big thing in (Unix) host security. Plenty of programs supported the original TCP Wrapper library by Wietse Venema, and people wrote their own takes on the idea. But nowadays, tcpwrappers is clearly on the way out. It doesn't seem to be used very much any more in practice, fewer and fewer programs support it at all, and of the remaining ones that (still) do, some of them are removing support for it. This isn't exclusive to Wietse Venema's original version; the whole idea and approach just doesn't seem to be all that popular any more. So what happened?
I don't know for sure, but I think the simple answer is 'firewalls and operating system level packet filtering'. The core idea of tcpwrappers is application level IP access filtering, and it dates from an era where that was your only real choice. Very few things had support for packet filtering, so you had to do this in the applications (and in general updating applications is easier than updating operating systems). These days we have robust and well developed packet filtering in kernels and in firewalls, which takes care of much of the need for tcpwrappers stuff. In many cases, maintaining packet filtering rules may be easier than maintaining tcpwrappers rules, and kernel packet filtering has the advantage that it's centralized and so universally 'supported' by programs; in fact programs don't have any choice about it.
(Kernel packet filters can't do DNS lookups the way that tcpwrappers can, but using DNS lookups for anything except logging has fallen out of favour these days. Often people don't even want to do it for logging.)
Having written some code that used libwrap, I think that another issue is that the general sort of API that Venema's tcpwrappers has is one that's fallen out of favour. Even using the library, what you get is basically a single threaded black box. This works sort of okay if you're forking for each new connection, but it doesn't expose a lot of controls or a lot of information and it's going to completely fall down if you want to do more sophisticated things (or control the DNS lookups it does). Basically Venema's tcpwrappers works best for things that you could at least conceive of running out of inetd.
(It's not impossible to create an API that offers more control, but
then you wind up with something that is more complex as well. And
once you get more complex, what programs want out of connection
matching becomes much more program-specific; consider sshd's
'Match' stuff as contrasted with Apache's
access controls.)
Another way of putting it is that in the modern world, we've come to see IP-level access control as something that should be handled outside the program entirely or that's deeply integrated with the program (or both). Neither really fits the tcpwrappers model, which is more 'sitting lightly on top of the program'.
(Certainly part of the decline of tcpwrappers is that in many environments we've moved IP access controls completely off end hosts and on to separate firewalls, for better or worse.)
2015-04-23
Upgrading machines versus reinstalling them
Yesterday I mentioned that we would be 'upgrading' the version of OmniOS on our fileservers not by using the OmniOS upgrade process but by reinstalling them. While this was partly forced by an OmniOS problem, it's actually our approach in general. We tend to take this for two reasons.
The first reason is that it leads to either simpler install instructions or more identical machines if you have to rebuild one, depending on how you approach rebuilding upgraded machines. If you upgraded a machine from OS version A to OS version B, in theory you should reinstall a replacement by going through the same process instead of directly installing OS version B. If you directly install OS version B, you have a simpler and faster install process but you almost never get an exactly identical machine.
(In fact until you actually do this as a test you can't be sure you even wind up with a fully functional replacement machine. It's always possible that there's something vital in your current build instructions that only gets set up right if you start from OS version A and then upgrade.)
The second reason is that customizations done on OS version A are not always still applicable or necessary on OS version B. Sometimes they've even become counterproductive. If you're upgrading, you have to figure out how to find these issues and then how to fix them up. If you're directly (re)installing OS version B, you get a chance to start from scratch and apply only what you need (in the form you now need it in) on OS version B, and you don't have to deal with juggling all sorts of things during the transition from version A to version B.
(Relatedly, you may have changed your mind or simply learned better since your install of OS version A. Doing a from-scratch reinstall is a great opportunity to update to what you feel is the current best practice for something.)
Mind you, there are machines and situations where in-place upgrades are less disruptive and easier to do than complete reinstalls. One of them is when the machine has complex local state that is hard to fence off or back up and restore; another is if a machine was heavily customized, especially in ad-hoc ways. And in-place upgrades can involve less downtime (especially if you don't have surplus machines to do complex juggling). This is a lot of why I do in-place live upgrades of my workstations.
2015-04-20
I don't think I'm interested in containers
Containers are all the rage in system administration right now, and I can certainly see the appeal. So it feels more than a bit heretical to admit that I'm not interested in them, ultimately because I don't think they're an easy fit for our environment.
What it comes down to is two things. The first is that I think containers really work best in a situation where the 'cattle' model of servers is a good fit. By contrast, our important machines are not cattle. With a few exceptions we have only one of each machine today, so in a container world we would just be turning those singular machines into singular containers. While there are some wins for containers I'm not convinced they're very big ones and there are certainly added complexities.
The second is that we are pretty big on using different physical machines to get fault independence. As far as we're concerned it's a feature that if physical machine X dies for whatever reason, we only lose a single service. We co-locate services only infrequently and reluctantly. This obviously eliminates one of the advantages of containers, which is that you can run multiple containers on a single piece of hardware. A world where we run a base OS plus a single container on most servers is kind of a more complicated world than we have now and it's not clear what it gets us.
I can sort of imagine a world where we become a container based environment (even with our present split of services) and I can see some advantages to it. But it's clear that it would take a lot of work to completely redo everything in our environment as a substrate of base OS servers and then a strata of ready to go containers deployed on top of them, and while we'd get some things out of such a switch I'm not convinced we'd get a lot.
(Such a switch would be more like a green field rebuild from total scratch; we'd probably want to throw away everything that we do now. This is just not feasible for us for various reasons, budget included.)
So the upshot of all of this is that while I think containers are interesting as a technical thing and I vaguely keep track of the whole area, I'm not actually interested in them and I have no plans to explore them, try them out, and so on. I feel oddly embarrassed by this for reasons beyond the comfortable scope of this entry, but there it is whether I like it or not.
(I was much more optimistic a few years ago, but back then I was just theorizing. Ever since then I've failed to find a problem around here where I thought 'yes, containers will make my life simpler here and I should advocate for them'. Even my one temptation due to annoyance was only a brief flirtation before sense set in.)
2015-04-12
One speed limit on your ability to upgrade your systems
One of the responses on Twitter to Ted Unangst's long term support considered harmful was this very interesting tweet:
[...] it's not "pain" - it just doesn't happen. At 2 weeks of planning + testing = 26 systems per year
This was eye-opening in a 'I hadn't thought about it that way before now' way. Like many insights, it's blindingly obvious in retrospect; of course how fast you can actually do an upgrade/update cycle determines how many of them you can do in a year (given various assumptions about manpower, parallelism, testing, and so on). And of course this limit applies across all of your systems. It's not just that you can only upgrade a given system so many times a year; it's that you get only so many upgrades in a year, period, across all of your systems.
(What the limit is depends very much on what systems you're trying to upgrade, since the planning, setup, and testing process will take different amounts of time for different systems.)
To upgrade systems more frequently, you have two options. First, you can reduce the time an upgrade cycle takes by speeding up or doing less planning, building, testing, and/or the actual deployment. Second, you can reduce the number of upgrades you need to do creating more uniform systems, so you amortize the time a cycle takes across more systems. If you have six special snowflakes running completely different OSes and upgrading each OS takes a month, you get twelve snowflake upgrades in a year (assuming you do nothing else). But if all six run the same OS in the same setup, you now get to upgrade all six of them more or less once a month (let's optimistically assume that deployment is a snap).
I see this as an interesting driver of uniformity (and at all levels, not just at the system level). Depending on how much pre-production testing you need and use, it's also an obvious driver of faster, better, and often more automated tests.
(Looking back I can certainly see cases where this 'we can only work so fast' stuff has been a limiting factor in our own work.)
2015-03-30
My preliminary views on mosh
Mosh is sort of a more reliable take on ssh that supports network disconnections, roaming, and other interruptions. I've heard about it for a while and recently Paul Tötterman asked me what I thought about it in a comment on my entry on SSH connection sharing and network stalls. The short version is that so far I haven't been interested in it for a collection of reasons, which I'm going to try to run down in the honest order.
First off, mosh solves a problem that I basically don't have. Mosh sounds great if I was trying to SSH in to our servers from a roaming, periodically suspended laptop, or facing a terribly unreliable network, or just dealing with significant network latency. But I'm not; essentially all of my use of ssh is from constantly connected static machines with fixed IP addresses and good to excellent networking to the targets of my ssh'ing.
Next, using mosh instead of ssh is an extra step. Mosh is not
natively installed on essentially anything I use, either clients
or especially servers. That means that before I can even think of
using mosh, I need to install some software. Having to install
software is a pain, especially for more exotic environments and
places where I don't have root. If mosh solved a real problem for
me it would be worth overcoming this, but since it doesn't, I don't
feel very motived to go to this extra work.
(In the jargon, you'd say that mosh doesn't fix a pain point.)
Then there's the problem that mosh doesn't support critical SSH features that I use routinely. At work I do a lot with X11 forwarding while at home I rely on ssh agent forwarding to one machine. This narrows mosh's utility significantly in either environment, so I could only use it with selected machines instead of using it relatively pervasively. Narrow usage is another disincentive to use as it both lowers even the potential return from using mosh and increases the amount of work involved (since I can't use mosh pervasively but have to switch back and forth somehow). There are some hand-waving coping measures that could reduce the pain here.
Finally, down at the bottom (despite what I wrote in my reply comment) is that I have much less trust in the security of mosh's connection than I do in the security of SSH connections. Mosh may be secure but as the people behind it admit in their FAQ, it hasn't been subject to the kind of scrutiny that OpenSSH and the SSH v2 protocol have had. SSH has had longer scrutiny and almost certainly far more scrutiny, just because of all of the rewards of breaking OpenSSH somewhere.
If I'm being honest, nervousness about mosh's security wouldn't stop me from using it if it solved a problem for me. Since it doesn't, this nervousness is yet another reason to avoid mosh on general principles.
(It may surprise people to hear this but I'm generally quite conservative and lazy in my choice of tools. I tend not to experiment with things very often and it usually (although not always) takes a lot of work to get me to give something a try. Sometimes this is a bad thing because I quietly cling to what turns out to be an inferior alternative just because I'm used to it.)
The 'cattle' model for servers is only a good fit in certain situations
To start with, let me define my terms. When I talk about 'cattle' servers, my primary definition is expendable servers that you don't need to care about when something goes wrong. A server is cattle if you can terminate it and then start a new one and be fine. A server is a pet if you actually care about it in specific staying alive.
My contention is that to have cattle servers, you either need to have a certain service delivery model or be prepared to spend a lot of money on redundancy and (HA) failover. This follows from the obvious consequence of the cattle model: in order to have a cattle model at all, people can't care what specific server they are currently getting service from. The most extreme example of not having this is when people ssh in to login or compute servers and run random commands on them; in such an environment, people care very much if their specific server goes down all of a sudden.
One way to get this server independence is to have services that can be supplied generically. For example, web pages can be delivered this way (given load balancers and so on), and it's often easy to do so. A lot of work has gone into creating backend architectures that can also be used this way (often under the goal of horizontal scalability), with multiple redundant database servers (for example) and clients that distribute DB lookups around a cluster. Large scale environments are often driven to this approach because they have no choice.
The other way to get server independence is to take what would normally be a server-dependent thing, such as NFS fileservice, and apply enough magic (via redundancy, failover, front end load balancer distribution, and so on) to turn it into something that can be supplied generically from multiple machines. In the case of NFS fileservers, instead of having a single NFS server you would create an environment with a SAN, multiple fileservers, virtual IP addresses, and transparent failover (possibly fast enough to count as 'high availability'). Sometimes this can be done genuinely transparently; sometimes this requires clients to be willing to reconnect and resume work when their existing connection is terminated (IMAP clients will generally do this, for example, so you can run them through a load balancer to a cluster of IMAP servers with shared backend storage).
(These categories somewhat overlap, of course. You usually get generic services by doing some amount of magic work to what initially were server-dependent things.)
If you only have to supply generic services or you have the money to turn server-dependent services into generic ones, the cattle model is a good fit. But if you don't, if you have less money and few or no generic services, then the cattle model is never going to fit your operations particularly well. You may well have an automated server setup and management system, but when one fileserver or login server starts being flaky the answer is probably not going to be 'terminate it and start a new instance'. In this case, you're probably going to want to invest much more in diagnostics and so on than someone in the cattle world.
(This 'no generic services' situation is pretty much our situation.)
2015-03-29
SSH connection sharing and erratic networks
In the past I've written about SSH connection sharing and how I use it at work. At home, though, while I've experimented with it I've wound up abandoning SSH connection sharing completely because it turns out that SSH connection sharing has a big drawback in the sort of networking environment I have at home.
Put simply, with connection sharing, if one SSH session to a host stalls or dies they all do. With connection sharing, all of your separate-looking sessions are running over one underlying SSH transport stream and one underlying TCP connection. In a reliable networking environment, this is no problem. In a networking environment where you sometimes experience network stalls or problems, this means that the moment you have one, all of your connections stall and you can't make new ones. You get to wait out TCP retransmission timeout delays and so on, for everything. And if there's an interruption that's long enough to cause an active session to close itself, you lose everything, including the inactive sessions that would have otherwise survived.
It turns out that this is really annoying and frustrating when (or if) it happens. On the university network at work, it basically never comes up (especially since most of the machines I do this to are on the same subnet as my office workstation), but from home the network stalls, dropped ACKs, and outright brief connection losses happened too often for me to keep my sanity in the face of this. The minor savings in latency for new connections weren't worth the heartburn when something went wrong.
(The most irritating things were and are generally retransmit timeouts. When the network comes back you can easily get into a situation where your new SSH session is perfectly active but the old session that had some output and thus a transmit timeout during the network interruption is just sitting there silently waiting for the TCP timeout to expire and a retransmit to kick off. This can happen on either end, although often it will happen on your end because you kept typing even as the network path was failing.)
2015-03-16
Solving our authenticated SMTP problem by rethinking it
Part of our mail system is a mail submission machine. Perhaps unlike many places, this machine has never done authenticated SMTP and as a result has never accepted connections from the outside world; to use it, you have to be 'inside' our network, either directly or by using our VPN (and at that point it just accepts your email). Recently this has been more and more a pain point for our users as it becomes more and more common for devices to move between inside and outside (for example, smartphones).
Unfortunately, one reason we haven't supported authenticated SMTP
before now is that it's non-trivial to add to our mail submission
machine. There are two tricky aspects. The first is that as far we
can see, any easy method to add authentication support to our Exim
configuration requires that our mail submission machine be rebuilt
to carry our full /etc/passwd (and /etc/shadow). The second is
that the mail submission machine still has to support unauthenticated
SMTP from internal machines; among other things, all of our servers
use it as their smarthost. This requires a somewhat messy and complex
Exim configuration, and being absolutely sure that we're reliably
telling apart internal machines from external machines and not
accidentally allowing external machines to use us without SMTP
authentication (because that would make one of our most crucial
mail machines into an open relay and get it blacklisted).
(Right now the mail submission machine has a strong defense in that our external firewall simply doesn't allow outside people to connect to it. It has its own access guard just in case, but its accuracy is less important. In the new world we'd have to open up access on the firewall and then count on its Exim configuration to do all the work.)
Exim can use a local Dovecot instance for authentication, but that
doesn't help the mail submission machine directly; to run a local
Dovecot that did useful authentication, we'd still need a full local
/etc/passwd et al. But then we had a brainwave: we already have
a Dovecot-based IMAP server.
Rather than try to modify the mail submission machine's Exim
configuration to add authenticated SMTP for some connections, we
can turn the problem around and do it on the IMAP server instead.
The IMAP server already has Dovecot and our full /etc/passwd; all
it needs is to have Exim added with a configuration that only does
authenticated SMTP. Sure, we wind up with two mail submission
machines, but this way we don't have to mix the two somewhat different
mail submission roles and we get a much simpler change to our
existing machines. People also get a somewhat simpler IMAP client
configuration (and one that's probably more normal), since now their
(outgoing) mail server will be the same as their IMAP server.
(The actual Exim configuration on our IMAP server can be just a slight variation on the existing mail submission Exim configuration. Insisting on SMTP authentication all the time is an easy change.)
As a side benefit, testing and migration is going to be pretty easy. Nothing is trying to talk SMTP to the IMAP server today, so we can transparently add Exim there then have people try out using it as their (outgoing) mail server. If something goes wrong, the regular mail submission machine is completely unaltered and people can just switch back.
2015-03-14
Using an automounter doesn't always help with bad NFS servers
Suppose, not entirely hypothetically, that you have a NFS fileserver that is locking up every so often, that you statically mount its filesystems on your IMAP server, your mail server, your general login server, your web server, and so on, and that as a result when it locks up many of your servers wind up grinding to a halt. Clearly the workaround is to switch from static NFS mounts to an automounter, right?
Not so fast. Switching from static NFS mounts to using an automounter probably won't help you here. The problem is that an automounter only helps with inactive, rarely used filesystems, because these are the only sort of filesystems that it doesn't have mounted. If you have machines that are naturally using these filesystems all the time, as people read their email and serve up their home pages and so on, there's no practical difference between an automounter setup and static NFS mounts. The filesystems are mounted and active all the time and the moment the NFS fileserver goes away you're going to start have problems, as process after process tries to read from them and stalls out.
(In this situation all that switching to an automounter will do is add more moving parts to your system.)
An automounter is a decent solution to unreliable but infrequently used NFS servers, which is the situation it was designed for. It's unfortunately not a particularly effective way to deal with frequently used fileservers that become unreliable, but then nothing is; if you need filesystems and they're not responding, you have problems no matter what.
(You only have two real solutions: make your fileservers reliable or make as many machines as possible not need filesystems from the unreliable fileservers. The latter may require extreme measures like multiple IMAP servers, each one of which only talks to one fileserver.)