Wandering Thoughts archives

2014-01-28

Building software packages yourself is generally a waste (why package selection matters)

I recently spent a day working on building a version of Amanda for our new OmniOS fileservers to be. The story of why it took a day is beyond the scope of this entry, because the point of this entry is that time was a deadweight loss to my employer. It was in a sense unproductive.

Building software packages is a solved problem. I can't bring anything new or novel or particularly valuable to it and as a result I add no value by doing it myself. Building software myself is pure overhead. I could have been spending my limited time doing something new and novel for my workplace, something that cannot be done by anyone else, but instead I was forced to do something that anyone else could do and probably any number of other people either had done already or are going to do in the future. I and my employer tolerate this loss of my time because the alternative is worse (ie, no Amanda and thus no backups for our new fileservers). But make no mistake, it's still a waste.

This is why sysadmins care about the available package selection for operating systems that we want to use. The more packages available, the less time and effort we must spend duplicating other people's time and effort as we do the low-value, low-productivity work of building our own copies of packages. In a real sense, maintaining packages locally is wasted work.

(The other thing about building packages is that it's fundamentally uninteresting. It is a boring drag to spend all day iterating configure settings, verifying that what you've assembled can be rebuilt on a clean system, and so on.)

PS: This would be different if it was trivial to build packages well, but it's not for various reasons I'm declaring beyond the scope of this entry.

Sidebar: why I'm not building OmniOS Amanda packages for people

Because it takes much more work to go from a compile that works locally in our specific situation to an OmniOS package that I'm relatively confident will work for anyone who installs it (especially since I don't know OmniOS packaging). Publishing things is also making a commitment to support them if problems come up. We might chose to not patch Amanda if an issue arises (in fact we're very likely to never change our build unless a serious security issue comes up) but this is not something that other people will really appreciate.

This is unfortunate in a game-theory way. Collectively the entire OmniOS world would be better off if someone spent the time to do this sort of thing for various packages; sysadmins would on aggregate spend less time overall. But for me individually, I'd be losing out to build a real package for Amanda instead of just doing 'configure' and 'make install'.

I don't have an answer to this.

BuildingPackagesWaste written at 00:23:05; Add Comment

2014-01-24

Things I want to remember during a security incident

We're having a little security incident right now. I thought about writing an entry about some details and specific thoughts, but then realized I had some more important things to write about, things that I need to keep in mind to keep grounded during our handling of the incident.

The first thing I want to remember is that hindsight is too easy. We live in a noisy world in the present, but when we look back at the past it's easy to pick out the thread of signal that we now know is there. Then you can sit there beating yourself up with the thought that you should have seen X or realized Y at the time. Our past selves were not incompetent idiots, not matter what it may look like now, and we made rational decisions at the time. Playing what-if games and blaming ourselves is both wrong and a bad idea.

(And by 'we' here I mean in large part 'me', since part of me is sitting here looking at things and going 'why didn't I ...'. Yes, yes, hindsight bias and outcome bias. It helps intellectually.)

The second thing I want to remember is that not all possible responses to our incident are worth doing. We could spend a steadily increasing amount of time analyzing what happened, hardening our systems, increasing our monitoring, adding this and that, and so on; all of them would increase the odds of stopping further incidents. But any and all of them will take time, time that will have to come from other work. At some point the right answer is 'more work to stop another incident is less important than what we were doing before'. However bad it may sound and feel, we'll need to simply live with the possibility of another incident happening (or there being undetected aspects to this one) and to move on.

(And then if (when) there is another incident, we don't beat ourselves up about this choice even though we can't say 'we did the best we could to prevent it'.)

I've been reading various John Allspaw writings about all of this for some time and it has done a lot to change and shape my views on all of this. But it's one thing to read all of this stuff and nod along intellectually and another thing entirely to have to try to live through it and put it into practice despite all of the inconvenient squishy human emotions running around.

(And I should read some of his stuff again, eg bits from this, so that I can do some sort of proper, useful postmortem writeup. It's probably past time that we did a real postmortem, although of course that takes time too.)

SecurityIncidentGrounding written at 01:17:11; Add Comment

2014-01-15

Real support periods versus nominal official ones

Suppose that you have a vendor with a support period of five years for their OS. This is great, since you can install a machine with that OS and get five years of support for it, right? Well, no, of course not. You only get five years of support if you install your machine right away when the OS comes out. If you install a machine two years after that, you only get three years of support for it; if you install a machine four years after the initial release, you get one year of support.

(We're assuming that the vendor doesn't extend the support period at some point due to popular demand.)

In a heterogenous environment where you're deciding on a project by project basis what software to use, you just have to consider support requirements when you're planning each project; each one can wind up using something different that fits its needs. Generally this is going to make any particular OS or software version less and less attractive for new projects as it gets closer and closer to its EOL, because more and more projects will have support requirements that run beyond the EOL.

(On the other hand you may see surprising usage even quite late, as new projects with short support requirements pick the old version for various other reasons, eg the people involved have a lot of familiarity with its quirks.)

If you want to run a mostly homogenous environment making a big use of this software, another important factor comes in: you now care a lot about how often the vendor releases new versions with new support periods. We can see this by taking the extreme case. Suppose that this OS vendor does releases only once every six years, leaving a one year period where you can't install or run a machine with support. You're unlikely to use this OS because you'd have to abandon it during that one year gap.

In a homogenous environment where you're strongly tied to a given bit of software on basically all of your machines, the worst case minimum support period you may actually get works out to be around the official support period minus how often the vendor does releases. If the vendor releases new versions every four years, the vague worst case is that you have to set up a new machine just before their new version comes out and so it gets about a year's support. The practical result is often that if the vendor only has what you consider a short overlap, you'll start dragging your heels on new machines as the time for a new version gets closer and closer; you really want to delay long enough to use the new version for its much longer support period.

(In practice you're unlikely to immediately start using the vendor's new version the day it gets released, so if you're unlucky you may have to install with the old version even for a while after the new one is released.)

This leaves sysadmins and other people really wanting a good healthy overlap of support periods after new versions are released. For a not entirely hypothetical example, five years of support coupled with a new version every two years gives us about three years of overlap for the worst case. That's a pretty respectable number that leaves me with little qualms about installing the 'old' version pretty much any time up to the point where we have the new version ready for production.

(There is also the related issue that real world support periods are shorter than they look.)

RealSupportPeriods written at 01:18:28; Add Comment

2014-01-12

Why I don't want fully automated installs

One of the modern system administration practices is fully automated OS installs; practically everyone supports it and some OSes even consider it their primary deployment method. I am a lonely holdout, at least for moderate sized environments such as ours where machine (re)installs are not an everyday activity. I have two reasons for this.

The first reason is a pragmatic one. As far as I can tell, it's still generally a fair amount of work to put together a reliable, fully functional auto-install environment that supports a fairly wide variety of machines and software setups. We may be an extreme case; we have at least five different ways to set up Ubuntu 12.04 servers (not counting hardware variations, including whether the system disk is mirrored) and we install machines across multiple networks. With our relatively low rate of (re)installs I can't convince myself that the time savings from having fully automated installs will actually pay off here.

(Note that we already have partially automated installs and even that takes a fair amount of work to assemble for a given LTS release.)

But the bigger reason is a philosophical one. I don't want fully automated installs because I think that they're too dangerous. I consider it a feature that you have to add boot media to a machine and then interact with it several times before the installer will destroy all data on the machine (or at least a disk or two of it) and set it up on our network with some IP address and some software configuration. The idea of 'if this machine PXE boots it will get (re)installed' is, for me, a nightmare. I very much want the chance to double-check what the install process is doing and to interact with it.

So what I really want is not fully automated installs but more smarts in partially automated installs. For example, it would be nice if we could have some high-level options that we could accept or reject, like adding some local standard partitioning schemes. Some of our machines will deviate from them (and I certainly want to see the disks that the install is about to scribble over), but a lot of the time I'd take the 'standard CSLab partitions' option.

(Our current Ubuntu install process basically asks us only for the machine's IP address information and its partitioning, and it's hard to get away from asking for the IP address. No, we don't want to run DHCP on our server networks.)

PS: I want to say that I also want better ways to build, maintain, and update partially automated installs, but I haven't looked at the state of the art in that for a while.

(To be clear: what I'm talking about here is OS installs, where you go from empty disks on a new server to an installed and perhaps somewhat customized OS on the network. There are lots of fairly good solutions for post-install customization and localization.)

AutoinstallsWhyNot written at 03:05:17; Add Comment

2014-01-09

Using different sshd options for different origin hosts

Suppose, hypothetically, that you have a need to both expose some hosts to incoming SSH traffic from the Internet and to allow root access (either direct or through automated means like authorized key permissions) to them over SSH. However, you certainly don't need both at once; you'll never be doing root access from the Internet. Wouldn't it be nice if you could have some sshd settings that varied based on where the connection was coming from?

Well, you can. Modern versions of OpenSSH support a Match directive in /etc/ssh/sshd_config and this can be used to allow or disallow a whole set of things based on the connection origin. In the case I gave above you could do this with:

PermitRootLogin no
Match 127.0.0.0/8,10.0.0.0/16
    PermitRootLogin without-password

(The IP address ranges here are an example.)

Match allows you to match connections based on various characteristics of both the connection (eg origin IP address or hostname) and the local target (eg the target local user and group). It can be used for a lot more than denying root access, of course; you can turn off password authentication, disallow access to specific local users, only allow access to specific local users, and so on. There are a lot of tricks that you can do here, so many that I'm going to leave them to your imagination (and reading of all of the sshd_config manpage).

Of course there are limits on how much additional security this will get you against a determined attacker who is specifically targeting your users. At least around here, if we made it very hard to get in by SSH from the outside Internet a targeted attacker would just switch to compromising the user's VPN access and going from there with 'inside' access. But at least we can configure sshd to absolutely rule out certain sorts of brute force attacks against selected accounts.

SshdSelectiveOptions written at 01:28:01; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.