Wandering Thoughts archives

2007-10-30

The problem with big systems

In system design, we're periodically confronted with a choice between building our environment out of a few big systems or a swarm of small ones. For example, consider storage; if you want 48 Tb of storage, you could buy two Sun Thumpers or six or seven smaller 15-disk units. Big systems are often attractive; their very size gives them economies of scale (and often means they cost less overall), and it's easier to deal with only two boxes instead of six or seven.

(Here I am talking about big systems that are still cost competitive with a swarm of small systems, where you are not paying a markup for their sheer size.)

The problem with a big system, like a Sun Thumper, is that it is very expensive to expand. Not so much because the big system has a high markup for the pieces (although some certainly do), but because the units themselves are so big; if you buy your storage in units of 'one Thumper', adding disk space is relatively expensive since you have to buy another Thumper. Swarms of small systems are much cheaper to add to, because the individual units are cheaper.

How much you need this expansion depends, of course, and may tie into other system design issues. If you do not have to design for future expansion, you might as well just buy whatever is the cheapest way to get what you need right now.

Sidebar: on Thumpers

I'm aware that I'm exaggerating about Sun Thumpers. Part of the cleverness of ZFS is that it gives Thumpers a relatively cheap incremental expansion path; you can buy a mostly bare chassis, add disks a few at a time, and use easy raidz or raidz2 pool growth. If we didn't need at least three they'd be sort of tempting.

BigSystemDrawback written at 22:55:00; Add Comment

2007-10-25

Long term storage management in the field

I've recently been thinking about what features we need for painless long term storage management, and in the process I've been thinking about what we actually do here with our SAN-based NFS fileserver storage over time. Somewhat condensed, we seem to:

  • add new storage units.
  • replace existing old storage units with new ones, possibly with some consolidation because disk and thus unit capacities keep growing over time (so that it's not a one to one replacement of units).

    (We have relatively undemanding random IO data rates, so we don't feel that we need to keep up the number of spindles.)

  • have a group buy a storage unit and:
    • some of it is used for new storage
    • some of it is used to add redundancy to the group's existing storage, so that they keep things that they feel are especially important going if either their unit or our unit(s) are still alive and intact.

We keep equipment for a fairly long time; our oldest SAN RAID unit is SCSI-based, for example. (And not fast modern SCSI either; it dates from the days when SCSI was your only viable choice in this space.)

We make no attempt to keep the number of front-end fileservers the same as the number of SAN RAID controllers; we have had more, less, and the same as circumstances change. The net effect is that natural evolution causes every fileserver to have disk space on more than one controller and each controller being used by more than one fileserver.

(Avoiding this would be nice but it's hard. Adding and consolidating storage without necessarily changing the number of fileservers makes this happen naturally over time, so we would have to contrive some painless and user-transparent way of moving filesystems between fileservers. Our fileservers are virtual ones, which makes it somewhat easier, but I don't think anyone's systems currently support this.)

UsingLongtermStorage written at 22:26:35; Add Comment

2007-10-15

The arrogance of trying to design for long term storage management

Many systems seem to be not really designed for a long term storage management environment. Instead they seem to opt for a kind of planned obsolescence approach where they assume that you will buy them, run them more or less into the ground without really changing or upgrading anything, and then replace them wholesale in a big, painful, user-visible bang.

From the perspective of a long term storage management environment this is a crazy thing to do; with no growth and thus no future, these systems are basically closed boxes. If you outgrow them, you're in trouble.

But from another perspective, the long term storage view is itself crazy: when you adopt it, you are betting that you can pick out what will be a good environment in five years or more, one that it will still make sense to expand and add on to. Given the rate of change in computing, this is a pretty breathtaking bet, one that has historically gone wrong more often than it has gone right.

In a way, the 'run it into the ground' approach is much more humble. It doesn't try to do anything more than pick what's the best for right now, and just assumes that in a few years from now the tradeoffs will be so different that there's no use trying to predict the winners in advance.

(If you need expansion in a year, or in two, you just buy whatever is the best at that point and hook it into your environment. In the mean time you aren't paying extra for bets that probably won't actually be right.)

LongtermStorageArrogance written at 22:42:07; Add Comment

2007-10-14

Why I think identity blurs into authority

In theory we can separate the ideas of identity and authorization, and it is common to present complex computer systems this way. In practice I think that many people blur the two together and attempting to forcefully separate them only leads to confused users and frustrated security people.

I believe that one reason for this is because we rarely think of people alone in the real world; instead we think of them with attached associations. It is not 'Chris Siebenmann, who is authorized to', it is 'Chris Siebenmann who works for the University of Toronto and is thus authorized to'. In turn I think this is because we understand that we need to specify a context for the identity in order for it to name a specific person. If you just say 'John Smith', the question is which John Smith you're talking about, and the answer is established by the context; that context may be implicit, but it's there.

Only on the Internet can we pretend to have identities divorced from context. And it is a pretense, because the context here is that of the identification system itself. (Or to put it in pretentious computer science terms, an identifier only has meaning within a particular namespace.)

Once you think of people with associations, those associations create natural ideas of authorization. In fact we should expect them to, because it is less work for people; they get to pigeonhole people into roles based on their identity associations and then just extend whatever privileges the role is entitled to.

(Or in other words, 'Chris Siebenmann works here, of course he's allowed into the building'. And when security systems depart from this they are perceived as getting in the way and get bypassed.)

IdentityAuthorityBlur written at 23:07:54; Add Comment

2007-10-04

The corollary to who actually benefits from bug reports

The corollary of who actually benefits from bug reports and yesterday's principle is that the more work you make people go through to report bugs, the less bug reports you get, and almost certainly the less good, detailed bug reports you get (because those are a lot of work).

Here, 'work' includes all of the various bits of overhead that you make people go through to file bug reports, including creating accounts. Also, every question that you ask in the process of submitting the bug report is more work, because it is more for the user to think about, especially when they may have no real idea what the answer should be.

The more work you make people do to submit bug reports, the more you will mostly get bug reports from three sorts of people: newbies, the rare selfless volunteer, and people who are nursing their pet cause. Apart from the selfless volunteers, none of these people are likely to give you very good bug reports; the newbies usually don't know enough, and the people with pet causes are obsessively focused.

One immediate conclusion is that Bugzilla is a horrible bug reporting system, no matter how popular it is. Not only do you have to register, but the typical Bugzilla configuration asks you a huge pile of questions, many of which require specialized knowledge to answer correctly.

BugBenefitCorollary written at 22:39:15; Add Comment

2007-10-03

A basic principle of system design

I've mentioned this in passing before, but I should be explicit at least once. Here is a very basic principle of designing systems that real people will use:

The people that benefit should be doing the work.

Okay, there is one exception: the people getting the benefit can pay for the work instead of doing it directly.

You design systems that violate this principle at your peril, because generally it doesn't work in the long run. You'd think that this principle would be obvious, except that it is routinely violated.

I think that there tend to be three forms that these violations take: underestimating how much work is really involved, overestimating the benefit that people get from it, and deliberately deciding that it doesn't matter. I can't say anything about the last one, but the first two are often caused by not taking a step back and trying to look at your system with the eyes of an outsider, not an enthusiastic developer. (Unfortunately, this is very easy to do.)

Every time you design a system (whether it is software or procedures), you should step back and ask yourself who benefits and who does how much work, and how rewarding it really is. Be painfully honest, because it is much better than spending all the effort only to have your system quietly fail.

BenefitPrinciple written at 22:27:52; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.