Rsync'ing (only) some of the top level pieces of a directory
Suppose, not hypothetically, that you have a top level directory which contains some number of subdirectories, and you want to selectively create and maintain a copy of only part of this top level directory. However, what you want to copy over changes over time and you want un-wanted things to disappear on the destination (because otherwise they'll stick around using up space that you need for things you care about). Some of the now-unwanted things will still exist on the source but you don't want them on the copy any more; others will disappear entirely on the source and need to disappear on the destination too.
This sounds like a tricky challenge with rsync but it turns out that there is a relatively straightforward way to do it. Let's say that you want to decide what to copy based (only) on the modification time of the top level subdirectories; you want a copy of all recently modified subdirectories that still exist on the source. Then what you want is this:
cd /data/prometheus/metrics2 find * -maxdepth 0 -mtime -365 -print | sed 's;^;/;' | rsync -a --delete --delete-excluded \ --include-from - --exclude '/*' \ . backupserv:/data/prometheus/metrics2/
Here, the '
find' prints everything in the top level directory
that's been modified within the last year. The '
sed' takes that
list of names and sticks a '/' on the front, turning names like
wal' into '
/wal', because to rsync this definitely anchors
them to the root of the directory tree being (recursively) transferred
(per rsync's Pattern Matching Rules and
Anchoring Include/Exclude Patterns).
rsync command says to delete now-gone things in
directories we transfer, delete things that are excluded on the
source but present on the destination, include what to copy from
standard input (ie, our '
sed'), and then exclude everything that
isn't specifically included.
(All of this is easier than I expected when I wrote my recent entry on discovering this problem; I thought I might have to either construct elaborate command line arguments or write some temporary files. That --include-from will read from standard input is very helpful here.)
If you don't think to check the rsync manual page, especially its section on Filter Rules, you can have a little rsync accident because you absently think that rsync is 'last match wins' instead of 'first match wins' and put the --exclude before the --include-from. This causes everything to be excluded, and rsync will dutifully delete the entire multi-terabyte copy you made in your earlier testing, because that's what you told it to do when you used --delete-excluded.
(In general I should have carefully read all of the rsync manual page's various sections on pattern matching and filtering. It probably would have saved me time, and it would definitely have left me better informed about how rsync actually behaves.)
Comments on this page: