The problem with tapes (for backup)

May 8, 2009

While it's true that high capacity backup tapes are expensive enough to make you blink, the media cost isn't really the problem with tape backups. (Although I haven't checked the numbers lately, I'm prepared to believe the claims that tape still has the lowest media cost per gigabyte.)

The real problem with tape backup systems is how much it costs to increase what I'll call your backup capacity: how much you can back up how fast. The basic way to increase backup capacity is to add another tape drive, but modern tape drives cost thousands of dollars each, and a tape library will cost substantially more. For many places, the media costs will pale next to this.

(And tape libraries don't help as much as you'd like these days, because you run into fundamental bandwidth limits regardless of how many tapes you have available to write to. For example, a tape drive that writes continuously at 100 Mbytes/sec can only back up about 2.75 terabytes in eight hours.)

If you need to keep a huge amount of backups (or archives), the media cost may still dominate over the cost of expansion. If you already operate at a large enough scale to have the expensive infrastructure, the incremental costs of expansion may drop (if you already have the huge tape silos and the staging systems to write to tape 24 hours a day, so you just need to stuff another tape drive into a drive bay). But for everyone else, it's hard to avoid the conclusion that tape is dying or already dead as a viable, affordable backup method.

Which brings up the fundamental problem that I see with tape backups: they've become a high end, low volume market for people who have a lot of money. They're inevitably going to fall more and more behind the explosive growth in affordable disk space that's driven by the consumer market; as (comparatively) low volume people, the tape vendors simply don't have the kind of money to invest in R&D that the consumer hard drive makers do.

(This is a variant march of the cheap problem.)

Comments on this page:

From at 2009-05-08 08:37:24:

You're right on. Storage and bandwidth of any kind are inexorably tied together. Data does not seem to want to be many places at once, and getting it there is not going to cease to be a problem for us anytime soon.

I'm afraid that the only good solution is parallelization of the 'bandwidth', or tape drives, in this case. In any event, once you grow to a certain size, it's the only way you can continue to do backups. At 4 times the size you mentioned, you're out of free hours in the day. If you've got more data (or if you like to switch tapes), you've only got one option.

Tape density, drive density, and bandwidth are all pulling in different directions, but the real winner is always volume of data. The first three answer to that one, leaving us to figure out new and creative ways to make sure that there's never data loss without recovery. It's enough to keep you up at night sometimes.

Matt Simmons

Written on 08 May 2009.
« How to set up your xorg.conf for RandR-based dual-headed systems
Our disk-based backup system »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri May 8 00:25:19 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.