2009-05-31
A thought on giving custom redundant storage systems some history
Suppose that you're building some custom storage backend that is simply too big to be backed up, so you have only redundancy; this is probably common if you're building a cloud-style environment or are otherwise dealing with a huge volume of data. This leaves you with the redundancy history problem, where you're protected against hardware failures but any mistakes are 'instantly' replicated to the redundant copies.
Suppose that you want to do better than this; you somehow want to give your redundant storage system some history without going all the way to backups.
The approach that occurs to me is to make your storage system be based around a 'copy on write' model for updates; instead of updating in place, you write new versions and change references (which seems like it would be handy for a distributed system anyways). Then instead of immediately removing unreferenced objects, you try to let them sit around for a certain amount of time (hours, days, or weeks, depending on how much extra storage you can have and what your update volume is).
What this gives you is time. If you make a mistake, you have time to panic, go digging in your datastore, and pull out the now unreferenced objects that correspond to how things used to be. Building tools to help with this ahead of time is probably recommended.
I think that this has two advantages over an actual snapshot feature. First, it has lower overhead in exchange for worse tools to access the 'snapshot' (which is a good tradeoff if you expect to make mistakes only very, very rarely). Second, you aren't restricted to looking only at the points in time where you happened to make snapshots, as you effectively make continuous snapshots.
2009-05-10
Another advantage of disk-based backup systems
One of the slightly subtle advantages of disk-based backup systems over tape-based backup systems is that capacity expansion is much easier; all you have to do is start using bigger disks. Since SATA is SATA, you don't need to replace your enclosure, and it is relatively easy to have multiple generations of disks with different capacities cycling through your system.
Contrast this with tape backups. To upgrade to higher capacity tapes, you have to not just buy the tapes, you generally have to buy an entire new tape drive that can write the new high-capacity format, for much more money and often a much more complex environment (at least in the old days, new tape drives were often not truly happy writing to the old, lower-capacity tapes you already had, and sometimes couldn't do it at all).
Fundamentally, what's going on here is that tape has a made a tradeoff; it has put the work and the smarts in the tape drive instead of the media, so that the media is cheap and simple while the tape drive is expensive and complex. Modern hard drives have gone the other way; the hard drive is ferociously complex (and is only cheap because they are made in such bulk), while the interface is relatively simple and general.
(There are subtle advantages to the tape tradeoff; for example, the simplicity means that there is less to get broken in tape media. And the tradeoff is a great deal if you have a lot of tapes compared to how many tape drives you have.)
2009-05-08
The problem with tapes (for backup)
While it's true that high capacity backup tapes are expensive enough to make you blink, the media cost isn't really the problem with tape backups. (Although I haven't checked the numbers lately, I'm prepared to believe the claims that tape still has the lowest media cost per gigabyte.)
The real problem with tape backup systems is how much it costs to increase what I'll call your backup capacity: how much you can back up how fast. The basic way to increase backup capacity is to add another tape drive, but modern tape drives cost thousands of dollars each, and a tape library will cost substantially more. For many places, the media costs will pale next to this.
(And tape libraries don't help as much as you'd like these days, because you run into fundamental bandwidth limits regardless of how many tapes you have available to write to. For example, a tape drive that writes continuously at 100 Mbytes/sec can only back up about 2.75 terabytes in eight hours.)
If you need to keep a huge amount of backups (or archives), the media cost may still dominate over the cost of expansion. If you already operate at a large enough scale to have the expensive infrastructure, the incremental costs of expansion may drop (if you already have the huge tape silos and the staging systems to write to tape 24 hours a day, so you just need to stuff another tape drive into a drive bay). But for everyone else, it's hard to avoid the conclusion that tape is dying or already dead as a viable, affordable backup method.
Which brings up the fundamental problem that I see with tape backups: they've become a high end, low volume market for people who have a lot of money. They're inevitably going to fall more and more behind the explosive growth in affordable disk space that's driven by the consumer market; as (comparatively) low volume people, the tape vendors simply don't have the kind of money to invest in R&D that the consumer hard drive makers do.
(This is a variant march of the cheap problem.)
2009-05-02
Why version control systems should support 'rewriting history'
There's a hot debate about whether VCSes should allow developers to 'rewrite history' (in the git sense). The usual way to frame the options can be wrapped up as a question: should the VCS allow developers to rewrite history, or forbid it? Put this way, a fair number of people will incline towards 'no'.
But this is a misleading and wrong question, because it quietly assumes that the VCS can actually stop people from rewriting history. This is incorrect; no VCS that has branches and lets developers apply patches can stop them from 'rewriting history' by hand if they want to badly enough. (This especially applies to DVCSes, where you can clone repositories outright and then throw away clones you don't want.)
So the actual choice that we have is this: should the VCS provide convenient support for letting developers rewrite history, or should it refuse to help and force them to do it all by hand?
As it happens we already know that developers are going to do this, one way or another; they do now to such an extent that there are various programs to help them with it (and I'm not talking about DVCSes that already support this). Since it's going to happen anyways, having the VCS support it has three advantages:
- you can give developers as much help as possible with what is a tediously annoying and potentially tricky operation.
- you can capture information about what actually going on.
- you may have a chance to gently steer developers away from bad ideas, provided that you can give them a good alternative.
By contrast, the VCS washing its hands of the whole deal means that it is abandoning the developers and abrogating any chance it might have had to mitigate damage or do any good. (But you get some mathematical purity out of it; my views on that are on record.)