Painless long term storage management without disturbing users

June 25, 2007

One of the things I have become conscious of recently is that managing your storage over the long term without lots of pain and without disturbing your users requires an additional set of features that you may not realize you need. (Partly I came to this realization by blithely proposing a design for our next generation of storage without really thinking about the issues, and then having people gently point out the problems I had missed.)

(By 'painless' I mean that it does not take a lot of sysadmin effort to deal with, so you do not have people babysitting dump and restore runs all weekend. To avoid disturbing users you can't change around your filesystem layout or have to unmount and remount NFS filesystems or CIFS shares or whatever; ideally they shouldn't notice that anything has happened.)

There are two major sorts of changes that happen over the long term: you replace old hardware, especially old disks, with newer and bigger stuff, and you add more storage, often in the process of replacing hardware. (Over the long term you are going to replace everything, including the physical servers themselves.)

So you need migration and growth: you need to be able to migrate data from old storage to new storage, and once this is done you need to be able to grow things to use the extra space on the new storage. It is reassuring to also be able to mirror data between your old storage and your new storage and run them both at the same time; this gives you a live test of the new storage before you are completely committed to it.

(In some sorts of migration systems you get mirroring for free, because you migrate data by creating a mirror and then discarding the old side of it.)

These features have to be provided by your storage system, because you cannot make them painless and transparent without the storage system's cooperation. As a consequence of this, you need to be able to make both the new and the old storage visible to the storage system on a single machine at the same time.

And of course all of this has to be transparent, which means that you have to be able to do it while your systems are live, with users banging on the filesystems and so on.

Written on 25 June 2007.
« Weekly spam summary on June 23rd, 2007
The advantage of a SAN »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jun 25 00:40:01 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.