Things that could happen to your archives

September 9, 2011

In the spirit of my old entry on things that could happen to your backups and to reinforce yesterday's entry on not trying to archive things, there's an incomplete list of things that have been known to go wrong with archives. If you're thinking of doing archives, you should be thinking about how you're going to avoid these.

  • you aren't archiving everything you need to archive.
  • the archive program doesn't work right; it writes a corrupt or incomplete archive, fails to notice or complain enough about read errors, or its archive doesn't capture a consistent and usable state of whatever you want to archive.

    With archives you should definitely be doing a full read of the archive and verifying it against the data on disk before you remove anything from disk.

    (In general archives are subject to many of the woes of backups. Take them as read.)

  • the archive media degrades over time.

    This is what most everyone talks about, and for good reason; if your data isn't there any more, nothing else matters. But it's only the tip of the iceberg for what you need and what can go wrong.

  • one or more pieces of archive media were physically damaged or destroyed due to a mishap, accident, water leak, fire, etc.

    If you care about real archives, you need more than one copy of any piece of data (and they should not be in the same place). Accidents and mishaps happen, especially to things sitting in the corner.

  • you've lost track of one or more pieces of archive media; they're stored somewhere, but you don't know specifically where any more.
  • in general you've lost track of what media you have and/or what data you've archived.
  • you've lost track of what is on each piece of archive media, so while you know you have an archival copy of <X> you don't know which one of fifty tapes it's on (and no one is going to go search through all fifty tapes unless it is really, really important).

  • you don't have anything that can read the media any more.
  • the media reading hardware that you carefully saved has quietly stopped working sometime during the years that it was in storage.
  • you can't connect the media reading hardware to any of your current systems; it requires an obsolete interface that is no longer supported.
  • you have an interface card for the obsolete interface you need, but it uses a bus type that is no longer supported on your machines.

    (I have some PCI SCSI cards. The odds that I will be able to put them in machines drops by the day.)

  • you have all of the hardware you need and you even saved cables too, but the OS driver for the hardware was removed several years ago after it became unmaintained because no kernel hacker had a copy of the hardware to test with any more.

  • all of your hardware works for the first N tapes (or disks, or whatever), then something breaks due to the amount of wear you're putting on old hardware. Since it's all obsolete hardware, there's no longer any spare parts, maintenance and cleaning kits, or the expertise to use any of these even if you had them.

  • you didn't write down what format the archives are in because it was obvious at the time.
  • you don't have any software that can read the archive format.
  • the details of the archive format either were never documented or were only documented in ancient documentation that you got rid of years ago. You earn bonus irony points if you carefully included the documentation in your archives.

  • the software you have that can read the archive format doesn't run on any of your current machines.
  • the old OS you need to run the software to read the archive format doesn't work on any of your current machines.
  • you have source code for software to read the archive format, but it doesn't compile on the current version of the OS because the compiler has gotten stricter, the library interfaces have changed, and the OS has moved from 32-bit to 64-bit.

  • your commercial archiving system requires a license key, but the company that made it is out of business now and certainly not issuing any new ones. Your old license key expired five years ago.

    (Yes, there are people who do long term archiving with commercial software.)

  • you have forgotten all of the details about how to work with the media, the archive format, and any surviving software. In theory you could with sufficient effort re-master all of the pieces and reverse engineer the format and extract the data. In practice you don't have the time to do all of this (because it is not a high enough of a priority), and so the archives are unreadable and will never be extracted.

    It's common to discover this shortly before your last media reader is decommissioned, because this is when everyone decides that you should move the data from the old media (and format) on to some new media. This is often the first time anyone has thought about the archives for years.

    (Even if you can remember all of this, it not infrequently turns out that you simply don't have enough time to cycle all of your old media through to read all of the data off of it.)

There are probably many more, but I have less painful experience with archives than I do with backups.

(Although we had an interesting time when the last 9-track reel to reel tape drive was being taken out of service. I don't think we got all of the old historical 9-track tapes copied that we wanted to.)

Written on 09 September 2011.
« Archival storage in the modern world
You really want to put your switches in server racks »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Sep 9 00:28:45 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.