Wandering Thoughts archives

2010-11-28

Why I am harsh on Solaris Live Upgrade and similar tools

In the previous entry I noted that one reason I was basically disinterested in Solaris Live Upgrade is that it had hung up when I tested it several years ago (and quite a few patch levels back). This may strike people as a rather harsh reaction to a bug, to which I am going to say: absolutely, but it's the same reaction I have to bugs in any similar tool, regardless of who it's from or what it runs on.

In order to make Solaris Live Upgrade worth using instead of dangerous, it needs to do a great many things right, things that are both complex and down at the heart of the system. LU must not modify my live boot environment (only the selected alternate), it must reliably boot the boot environment I want it to, it must correctly handle falling back to another boot environment, booting an alternate environment must leave my main one completely untouched, and so on. And it must get these things right all the time and even in obscure cases, because something we're doing may turn out to be one of those obscure cases; a tool like LU cannot afford to be a 90% tool or even a 95% tool. If LU screws up any of this, I have serious problems; at the worst, I have data loss and major system downtime. Pretty much if LU gets anything wrong, I am better off not using it at all.

There is only so much of this that I can explicitly test, which means that I have to actively trust LU and the people who wrote it to get all of these things right. What happens next is simple: bugs destroy my trust. A bug is a place where LU and its programmers have not gotten it right. Sure, I might be able to work around the bug and get LU going anyways, but if there is a bug in something that I have tested, how can I have any confidence that there aren't other bugs in things that I either haven't tested yet or can't even test at all?

I can't. And without trust in the system, I can't use it at all, not unless I desperately need it and I'm willing to take a significant risk because I have no feasible alternative.

So yes, absolutely I am harsh. For good reason.

(Solaris Live Upgrade isn't the only thing that I have tried, hit a bug in, and abandoned. For example, I would like to be able to trust Linux LVM's pvmove, but I had it lock up on me once close to half a decade ago and I haven't touched it since. Maybe it's better now; I don't care. It's not worth the risk of actual data loss.)

HarshOnSystemTools written at 02:04:08; Add Comment

2010-11-14

Four reasons to have a firewall

Recently I ran across someone asking the question 'why have a firewall?' As it turned out, he had several sorts of host-based firewall protection, but in thinking about the question I came up with four broad reasons that firewalls can be a good idea:

  • because your services and servers suck. You're forced to run things that were written by addled monkeys, in environments that either require random services of unknown and dubious security impact or just start them up every so often whenever they feel like it. Or perhaps you are stuck with known-vulnerable machines that you cannot upgrade for various reasons.

    (This is perhaps the leading reason to use firewalls in front of end user machines.)

  • because it simplifies and speeds up your internal architecture. Yes, you could put SSL and passwords and whatever on your internal memcached instance and your backend database servers and so on, or run them over a disconnected internal network. But it's simpler to just not let people talk to them, and it may give you faster performance.

  • because it reduces the amount of code that handles untrusted network input, what security people call the 'attack surface' (the code that aggressors could attack). Sure, your database server has its own access control system, but that's a lot of code that gets run on untrusted input and historically some of it has had bugs. Just not letting people talk to it at all reduces your risk, possibly substantially.

  • because it guards against mistakes and accidents in service and host configuration. Without a firewall you are one errantly started daemon, one omitted access control restriction, or one not yet fully installed and patched host away from a security vulnerability.

    (I once put a new webserver on the network and had hits from automated vulnerability scans within sixty seconds of port 80 starting to respond. This is apparently slow as these things go.)

Whether to use host-based firewalls or an external firewall is an implementation decision, but I tend to think that an external firewall is more reliable and simpler to configure and keep straight (if you have a non-trivial internal architecture of what is where and who can talk to it and so on). Of course it is also a single point of failure, as the no-firewall people keep reminding us, so the right thing to do is to have both well protected hosts and an external firewall.

WhyFirewall written at 22:13:35; Add Comment

2010-11-04

What we (currently) use virtualization for

I've pulled this out of Random's comment on yesterday's entry:

[...] but I stop short of doing everything on virtual machines, which I think you do more of.

Although I may have accidentally left people with a different impression, right now we're actually not using virtualization for very much. We have one virtualization host machine, and on it we currently run only three virtual machines: two Windows images and a small Linux machine that forwards some low-priority email.

The Windows machines are Terminal Services servers; they exist because we have a departmental mandate to provide general access to Windows and the Office suite, so that (for example) non-Windows people have some way of reading Powerpoint presentations. Virtualizing our Windows servers has huge management advantages, especially that we will never have to worry about hardware upgrades (including via hardware failure). We also get some ability to reliably roll the state of the system back if something goes catastrophically wrong, which is reassuring.

(We have two Windows machines because we need to provide access to both Office 2003 and Office 2007.)

The small Linux machine is virtualized because we couldn't be bothered to find and deploy some physical hardware for it (not 'wasting' hardware on such a small machine was less of a consideration than you might think).

I would kind of like to use virtualization for more than just this, but in our environment it's hard to find small unimportant services to virtualize. Our machines tend to be either big, important, or both. Big matters because virtualization, especially cheap virtualization, currently costs you performance and capability. Importance matters because when a service is important, you wind up wanting fault isolation for it, fault isolation that our virtualization environment does not give us.

(Thus, for example, we run two local caching DNS servers for our local users, each on dedicated hardware, despite modern DNS servers not exactly requiring big machines.)

OurVirtualizationUse written at 00:12:11; Add Comment

2010-11-03

One reason that I call us a midsized environment

I mentioned earlier that I have a number of reasons to call us a midsized environment. One of them is how we need to run our environment, in that we are in the middle between two extremes of how to deal with systems.

On the one hand, we are sufficiently large that you don't want to do things by hand and it makes sense to automate at least some things. We've long since grown past the point where all of our machines could be set up separately or run individually (and not just because NFS requires synchronized UIDs), which is the practice that you often see in small environments.

On the other hand, we are not so large that we have to automate things or die. Once you reach a certain size, it is basically impossible to run your environment in any way that requires routine hands-on attention to individual machines; you must automate every such thing or you'll be unable to keep your environment up (or at least unable to improve it at all). This is the size where people get fanatical about automated deployments, automated management with Puppet or Cfengine, automated monitoring with your choice of tool, and so on.

We sit in the middle of all of this; we have some automation but also some things that we do by hand, partly because we like it that way and partly because it's too much of a hassle to build and maintain an automated system. Hence, mid-sized.

(I maintain that sitting in the middle like this is sensible in at least some cases, and there are good reasons not to immediately jump up to the Jumpstart/Kickstart, Puppet, etc etc world. Part of it is the costs of automation, but the full discussion deserves another entry.)

(See also this older entry on levels of automation.)

WhyWeAreMidsized written at 01:16:17; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.