2008-06-30
The many problems with bad security patches
One might perhaps accuse me of getting overly worked up about bad security patches. Is it really such a big deal if a security patch has a flaw?
My answer is yes, because there are a number of bad consequences when security patches are untrustworthy:
- it discourages people from installing them. As we've seen repeatedly,
having more insecure systems around endangers everyone, whether
it is on the Internet or behind your firewall.
- a broken but 'secure' machine is not really an improvement over a
functional but insecure machine. In both cases the overall system
is not functional, assuming that you consider security as part of
the overall system functionality.
(Of course the devil is in the details, specifically what broke and what the security issue was, and also how important security is; in some environments being completely turned off is preferred to being insecure. I am assuming here that the breakage is in something relatively important.)
- you can't use security patches to solve the security issue right
now, because you have to put patches through testing in order to
see if they broke anything this time and if so, what. At best you
can use the release of a security patch as a signpost that your
system really is vulnerable to some general issue, and that you
need to get working on some sort of a fix.
(Yes, yes, test everything. Wouldn't it be nice if you didn't have to? And in theory that is the promise of security patches; the only change they are supposed to introduce is a security fix, and thus they should be safe to apply under almost all circumstances.)
- they increase the overhead of security in general, in both people's
time and in hardware needs. All else being equal, this overhead has
to come out of somewhere, in actual useful work not getting done
and machines not getting used for useful things.
- if sysadmins believe vendors and do rush installs of what turn out to be bad patches, we lose credibility and thus our overall ability to influence people. This is bad because there are security things that people should listen to you about; you really don't want to be the sysadmin that cried wolf.
Collectively, this set of consequences is pretty bad news. Hence my strong opinions on the issue.
2008-06-27
Fault hierarchies and problem reports
Here is something that I have come to feel strongly about: things that report problems (as opposed to just log them) should have some idea of root causes and a fault hierarchy. Then when you report things you should report the root cause you've found and only mention the consequences as a side note, instead of screaming about every consequence.
(As a not entirely hypothetical example, it does no good to spam me with notices about lots of ZFS pools being unavailable when the real problem is that the system can't find any network interfaces at boot time so it can't make any iSCSI connections so there are no pool devices.)
Yes, this is difficult and challenging. But it is the job you took on when you decided to write something that actively shoved reports of problems at people. If you cannot do a good job of it, you need to stick to just logging things; this is one of the areas where a tool that does only a half-hearted job can be worse than no tool, because it is trivial to generate an avalanche of surface errors from a single important root cause.
(By 'reporting' I mean aggressively forcing things in front of people through a variety of methods, from dumping messages on the system console to sending email. In short, anything that could interrupt people. Yes, dumping messages on the console is interrupting people; consider what happens to the poor sysadmin who is trying to get the system going again when you dump ten screens of error messages on his session.)
2008-06-26
Virtual desktops versus multiple monitors
A commentator here asked:
With respect to tabs vs. windows (vs. virtual desktops vs. extra monitors vs. separate computers), I always wonder what combination of the above will lead to the greatest productivity.
Having used both, my personal opinion is that virtual desktops are inferior to multiple monitors for the same reason that tabs are inferior to windows, namely that you can't see two virtual desktops at once. This makes virtual desktops good for grouping and a good place to shove excess windows but not a good solution when you really need to see that many things at once.
(Although I'd used virtual desktops for years, even a casual experiment with dual monitors was such a powerful experience that I was hooked on the spot. But I continue to use virtual desktops even with multiple monitors; you can never have enough space, and as mentioned I'm very big on using spatial organization for things, which virtual desktops are good for. Plus, for a sysadmin the ability to instantly get an uncluttered workspace to deal with some important interruption is very useful.)
I'd think that a single computer with multiple monitors is better than several computers, each with a single monitor, for much the same reason; using separate computers with separate displays constrains how you organize and re-organize information. The exception is if you really need multiple computers for some reason (a classic reason being testing on one machine and debugging from the second), in which case you have no choice.
Sidebar: the poor man's KVM
If you have a primary machine plus one or two secondary machines that you only use occasionally and modern LCDs with both VGA and DVI inputs, you can construct what I call a 'poor man's KVM':
- give each machine its own keyboard and mouse, ideally small ones for everything except your primary machine
- hook your primary machine up to one set of inputs (DVI or VGA) on the LCD panels.
- hook the secondary machine(s) up to the other LCD panel inputs
When you want to use a secondary machine, pull out its keyboard and mouse and switch the appropriate LCD monitor's input to it. (If you don't need to watch anything on the primary machine, you may want to power off the other monitor to avoid absent-minded mistakes.)
It's probably easier to put the primary machine on the DVI inputs and the secondary machine or machines on VGA (partly because more things still support VGA than support DVI).
2008-06-22
The implicit versus the explicit
In a lot of computer things (such as programming languages, or system environments) we have the choice between being concise and leaving things implicit, and being verbose but making things explicit. Here is the thing about this choice:
Computers are fine with implicit things because they never forget; they effectively have perfect memories for this sort of thing. But humans have limited memories (especially short term memories) that have to be periodically reinforced, so we necessarily forget some of the implicit stuff every so often. Which periodically causes problems.
(The problems of modal interfaces are another example of this, since the computer has a perfect 'memory' of what mode you are in but people do not.)
Obviously, the more implicit things you design into a system, the more dangerous this effect is (or at least the more likely it is to come up). At the same time, we can't make everything always be explicit; there's always some implicit stuff somewhere (even just in common user interfaces) and conciseness has a value of its own.
Like many of these issues, it is easy for the designers of a system to not really be conscious of how much implicitness is present; their memory for its implicit details is in great shape and is constantly being reinforced because they work deeply with the system all the time. Similarly, frequent users are also reinforcing their memory all the time.
2008-06-21
A bug reporting paradox: don't put in too much detail
Here's a paradox about bug reporting: while details are good, too much detail is bad (and by this I don't just mean too much verbosity).
Specifically, you should not go into too much detail about what is going on unless you are certain that you know what you're talking about and you're sure you're correct about it. By what is going on I don't mean symptoms (those are relatively easy, although tedious, to get right) but what happens when you do things like go digging in the source to narrow down where exactly the problem is.
The problem is that when you sound knowledgeable and sensible, it is very tempting for people to believe you; after all, it saves them time and effort and so on. But if you turn out to be in over your head and wrong, the result is that everyone wastes time (and people get irritated at you, which defeats one of the purposes of bug reports).
This doesn't mean that you shouldn't go digging at all, because going digging and being right about it is often the fast and best way to get an issue fixed.
(All of this goes double for patches, because it is easy to get them wrong in subtle ways if you don't understand the code. Even testing it yourself isn't really enough.)
2008-06-13
The cost of virtualization
Virtualization is in the air around here, partly as a way of being environmentally friendly; fewer physical machines means less power and heating and so on. Now, I'd like to be environmentally friendly, but one of the issues with virtualization is that getting into it for anything important has significant startup costs.
One of the reasons we run different services on different machines to start with is for fault isolation, so we don't have all of our eggs in one basket. Virtualization reverses that; if you lose your host machine, you will lose any number of virtual machines. Thus if you are virtualizing important machines you need to be able to recover from this, which means at least two fairly beefy host machines and something approximating a SAN.
(This assumes that you aren't using virtualization just as a way of dealing with stupid software that demands to take over a whole machine.)
This issue makes it hard to just dabble your toes in virtualization. It's a serious investment and thus a serious commitment to get involved, which makes it a hard sell unless you are convinced that it is the way to go.
(This is especially so if you don't already have a SAN environment and a need for some big servers, so that if virtualization doesn't work out in practice you can't just use the hardware for something else.)
Note that in many cases you don't need expensive vendor software for live migration and failover and so on. Certainly in our environment it would be good enough just to bring the storage for the virtual machines up on another host machine and boot them back up, exactly as if we'd had a localized power failure with real hardware.
(Don't laugh. How many machines would you lose right now if one of your rack PDUs failed? If you can accept that, you can accept a virtualized version of it, especially as you can probably recover faster in the virtualized case. You might even be able to automate the recovery.)
2008-06-04
Some corollaries to the charging problem
Here's a couple of corollaries and effects of the charging problem that are worth mentioning (or at least that occurred to me).
The obvious corollary is how much more sensible the problem makes the 'give away the service to build interest' approach often used by Internet startups. If people can't do anything with you before they spend money, your window to get them interested enough to do so is very limited (probably more or less one web page, at best). Giving them free access gives you a much more extended chance to catch their attention so that they're going to find you attractive enough to give you money.
Next, one effect that's struck me is how the charging problem changes what you need to succeed. Getting great value is essentially an internal technical problem, where you just need to squeeze your costs and your profit margins and so on, and you can work on it in isolation. But making people interested in you is an external social problem (and a hard one), where you must be willing to immediately change your plans in reaction to what happens around you (and thus must be willing to rapidly throw away things that you thought were great ideas and may be quite invested in).
Or to put it another way, you can't just make something that's good and cheap, you have to make something that people care about. Technical people are good at the first, because it is mostly amenable to their skills, but traditionally not all that good at the second.
(And this is one reason why the world is full of technical people plaintively going 'but I have a better mousetrap, and it's cheap too!')