Having one is often much easier than having more than one

June 30, 2022

Recently, I was sort of asked why we didn't just have multiple Prometheus servers (and were instead using mirrored 20TB HDs). There are specific answers for the case of Prometheus, but in general one is the easiest number of things to have. The moment you have more than one of something, you get a bunch of extra problems. At a minimum, you have to worry about consistency between your N things, coordination between your N things, and choosing which of your N things you're going to interact with. These are all hard problems, and you don't have them if you only have one thing.

The problem of choosing between your N things is often considered sufficiently hard for people to deal with that systems go well out of their way in order to turn N things back into one as far as you're concerned. For example, HTTP load balancers turn N web servers back into one from an outside perspective, and RAID mirrors turn N disks back into one as far as everything above them is concerned.

In both cases (and many others) the illusion that you're dealing with only a single thing can falter. An underlying web server or physical disk that has different answers than all of the others will cause you heartburn and break the illusion of a single thing (well, a single non-flaky thing). Still, these illusions tend to work well enough that we readily use them and usually prefer them to the alternatives.

(One reason they work so well is that vast amounts of engineering resources have been devoted to making them that way.)

As a system design issue, the real question is not why you only have one of something, it's whether you absolutely have to have more than one. Even if having more than one works just as well as having only one, it'll almost certainly be more complicated and have more peculiar failure modes. "Do the simplest thing that works" is a good idea in general, and much of the time this means 'one' is the number you want.

A corollary to this is that any 'more than one' system that wants to be used should pay a lot of attention to making itself work reliably and smoothly. We pretty much universally use software mirrored disks on our Linux systems, and certainly one bit reason for that is that the Linux software RAID people have spent a lot of effort creating a system that just works, minimizing the disadvantages of having two (or more) while maximizing the advantages.

(This observation is in no way new. I just feel like writing it down today.)

Written on 30 June 2022.
« Notes on the Linux kernel's 'pressure stall information' and its meanings
A quiet shift in what tech people build for their blogs »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jun 30 22:26:33 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.