'In place' filesystem defragmentation with Disksuite

September 11, 2006

While the Berkeley FFS and derivatives like Solaris UFS are much, much better at dealing with fragmentation than the original V7 filesystem (which continued on into System V before SysVR4), they can actually get fragmented over time to a degree that matters.

Traditional Unix, Solaris included, really doesn't have any tools for defragmenting filesystems; instead, you get to to do it the brute force way, by copying everything into a clear filesystem. With Solaris Disksuite and mirrored disks, it is possible to do this 'in place', so that you don't have to copy twice or remount the filesystem on any NFS clients.

The procedure is slightly less nerve-wracking if you have a three way mirror, but should work OK even for a two-way mirror. Here's how it goes:

  1. bring the machine into single-user mode, but do not unmount the filesystem you want to defragment.
  2. insure that all the submirrors are in sync.
  3. metadetach all but one submirror.
  4. rm everything in the filesystem except the lost+found directory.
  5. ufsdump 0f - /dev/md/rdsk/<detached submirror> | ufsrestore rf -
  6. check and make sure the data is there. No, really, do it twice.
  7. metattach the detached submirrors.
  8. bring the machine back up multi-user. (Optionally, first wait for the resync to finish.)

You're done. NFS mounts are even still intact, although anyone with an open file or a current directory in a subdirectory of the filesystem will see a few small problems.

(In our case it was a mail spool, so there weren't any subdirectories to worry about.)

There should be a similar procedure with Linux software RAID, although it'll probably be slightly more troublesome since the Linux equivalent of metadetach is somewhat more abrupt. (I'd probably unmount the filesystem before ripping the mirror off.)

PS: I am not sure if I am grumpy that Sun didn't make 'metattach' be called 'metaattach' instead, for complete consistency among the Disksuite command names.


Comments on this page:

By Dan.Astoorian at 2006-09-12 14:01:56:

The procedure is slightly less nerve-wracking if you have a three way mirror, but should work OK even for a two-way mirror. Here's how it goes:

  1. bring the machine into single-user mode, but do not unmount the filesystem you want to defragment.
  2. insure that all the submirrors are in sync.
  3. metadetach all but one submirror.
  4. rm everything in the filesystem except the lost+found directory.
  5. ufsdump 0f - /dev/md/rdsk/<detached submirror> | ufsrestore rf -

[...]

Fiddling with detached submirrors like this always made me slightly nervous (perhaps groundlessly). Personally, I've always preferred this approach:

  1. metadetach one submirror
  2. metainit a new mirror using the old submirror's device, and mount and clear the filesystem
  3. ufsdump 0f - /dev/md/rdsk/<original mirror> | ufsrestore rf -
  4. verify that the copy was correct and complete
  5. unshare and unmount the filesystems, and use metarename -x to exchange the mirrors' metadevice numbers
  6. remount and reshare the defragged filesystem (using its original metadevice number)
  7. tear down the old, now-unused mirror, and attach its submirror(s) to the new mirror.

If the filesystem is large, you can even try to minimize your downtime by doing the level-0 dump while the system is live to copy most of the data, then taking the data set offline and doing an incremental dump to pick up the changes that occurred during the level-0.

The difference between your approach and mine is that my way, you never have to access a submirror directly: all accesses are through a bona fide mirror metadevice. Note that the NFS filesystem handles are tied to the metadevice numbers of the mirrors: NFS will not notice that you've exported a different mirror with the same number.

In fact, I've used this method to convert a simple metadevice into a one-way mirror (as a prelude to growing the filesystem onto a larger physical device) without the NFS clients missing a beat:

  1. ifconfig down the server's network interface, so the NFS clients won't get failures for the filesystem while it's unavailable;
  2. from the console or via a second network interface, unshare and unmount the filesystem
  3. metarename the device, and metainit a mirror using the original device as a submirror (e.g., metarename d1 d11; metainit d1 -m d11)
  4. remount and reshare the filesystem, and ifconfig the network back up.

--Dan Astoorian

By cks at 2006-09-12 16:56:07:

If you make a new filesystem, does the root inode's generation count and other stuff come out the same? (I suppose it pretty much has to have the same generation count if it's a true generation count, since you can't ever do any of the things to it that change the generation count.)

By Dan.Astoorian at 2006-09-12 17:43:36:

Ah, but I never said anything about making a new filesystem--only creating a new mirror, using the device of the detached submirror. (Where I said "mount and clear the filesystem", you can still do this by rm'ing everything except lost+found.)

However, I should point out that I've never personally used this method to defrag a UFS filesystem. I don't know enough about the internals of how it fragments blocks to be confident that rm'ing all the files is sufficient to defragment it, so I've always preferred the pave-over-the-filesystem-and-remount-the-clients route.

As for inode generation counts, though, fsirand(1M) (called by newfs(1M)) would suggest that the answer to your question is probably "no."

By cks at 2006-09-13 10:38:45:

Any sane FFS-derived filesystem, UFS included, should completely defragment itself when everything is rm'd; the block (and inode) freelists are just bitmaps, so empty is empty.

(Also, there's a certain amount of experimental evidence, in that we tried it and it seems to have worked. Certainly the IO rates on that filesystem are significantly up from the pathetically low numbers they were before we did the defragmentation.)

Written on 11 September 2006.
« A thought on iTunes and similar online services
Python's extra-clever help() function »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Sep 11 17:26:57 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.