Fun with upgrading our backup server

November 3, 2005

We've been having odd problems with backups for more than a week now. Problems with backups are often frustratingly hard to debug, since generally I only get one run a day and it really should work.

Our Amanda backup server still ran Red Hat 7.3 for various reasons (including lack of disk space, which is a long story). I had been planning to upgrade it to Fedora Core 4 just before the backup problems hit, then they pushed that back, then we worked out that they'd started right after we rebooted the server to put the first of two larger system disks in so suddenly it was upgrade time again.

Apparently the server didn't like this idea. The problems we encountered included:

  • a small mistake in swapping in a larger system disk.
  • the Fedora Core 4 install kernel paniced on startup when I tried a network based upgrade.
  • the CD-ROM drive wasn't working, so I couldn't try a CD-ROM based upgrade. We wound up swapping the CD-ROM drive, at which point the network upgrade kernel liked the system again. (I love hardware. I really do.)

Once this was sorted out, the RH 7.3 to FC 4 upgrade worked fine, which is pretty impressive if you think about it; we skipped five intermediate releases.

Now comes making sure that Amanda and the tape robot (and tape drive) are still talking to each other. This has been enlivened by the failure mode of our Amanda installation: if it doesn't have enough permissions to talk to the tape drive, I don't get diagnostics; instead it just sits forever whenever it needs to swap which tape is loaded in the tape robot.

(Specifically, a shell script winds up waiting for mt to report that the tape drive is online after having finished loading the new DLT tape. I have no idea where the error messages from mt wind up, but it's certainly not the screen.)

Then we ran out of unused tapes in the tape robot, so I'm still not sure if the whole mess really works. I'll probably only find out tomorrow night (or Friday morning).

(This does neatly illustrate one reason why the backup server stayed with Red Hat 7.3 for so long. And why it is still running Amanda 2.4.2p2, released in 2001.)

Written on 03 November 2005.
Last modified: Thu Nov 3 03:34:51 2005
