A major caution when using 'rsync -a
' to copy or move directory trees
We had a learning experience the
other day. Part of the learning experience was about the behavior
of du
in the face of hardlinks and
another part of it was to do with odd ZFS space usage behavior, but the
largest part and the ultimate cause was because 'rsync -a
' doesn't
preserve hardlinks. If you copy or move a directory tree with 'rsync
-a
' and it contains internal hardlinks, your new copy will break those
hardlinks and copy each hardlink separately. Among other effects, this
will increase the amount of disk space that the new tree uses.
This limitation is widely known on the Internet and is explicitly
spelled out in the rsync
manual page section on the
-a
option:
This is equivalent to
-rlptgoD
. It is a quick way of saying you want recursion and want to preserve almost everything (with-H
being a notable omission). The only exception to the above equivalence is when--files-from
is specified, in which case-r
is not implied.Note that -a does not preserve hardlinks, because finding multiply-linked files is expensive. You must separately specify
-H
.
There's also a discussion of this in the section on the --hard-links
option.
(Another exception is that '-a
' doesn't imply '--sparse
', to
preserve sparse files as sparse.)
This is at one level a sensible tradeoff. Hard links are uncommon these days, they aren't supported in all environments, and they require potentially unbounded amounts of memory to process (since you have to keep track of every file you've seen with a hard link, so you can tell if you saw it again). If you search for discussions of rsync and hardlinks on the Internet, you can find people who've had problems with memory usage when dealing with large, heavily hardlinked directory trees.
At the same time it's not entirely ideal for system administrators who
by default think of 'rsync -a
' as a faithful way to copy, clone,
move, or back up a directory tree. While it is in the manual page, the
rsync
manual page is very big and most people don't read it carefully
even once (never mind often enough to remember this if they haven't
been burned by it). And usually it works because usually you don't have
hard links or it doesn't really matter if they get broken (just like it
usually doesn't matter if sparse files get de-sparsed in an rsync copy).
Since we've had a learning experience about rsync and hardlinks, we're
probably going to remember this for years to come (or at least I
hope). We're certainly updating scripts and canned
practices to use '-H
' with '-a
', and now that I've looked it up we
may well add '-S
' to that too. And I should probably read over the
entire rsync manual page to see if we're missing anything else, even
though I expect it to be very boring.
(I had a narrow personal escape with this. I almost made a new root
filesystem for my home desktop recently,
and if I had, I might well have copied the old root filesystem to the
new one with 'rsync -a
'.)
|
|