Wandering Thoughts archives


Rsync'ing (only) some of the top level pieces of a directory

Suppose, not hypothetically, that you have a top level directory which contains some number of subdirectories, and you want to selectively create and maintain a copy of only part of this top level directory. However, what you want to copy over changes over time and you want un-wanted things to disappear on the destination (because otherwise they'll stick around using up space that you need for things you care about). Some of the now-unwanted things will still exist on the source but you don't want them on the copy any more; others will disappear entirely on the source and need to disappear on the destination too.

This sounds like a tricky challenge with rsync but it turns out that there is a relatively straightforward way to do it. Let's say that you want to decide what to copy based (only) on the modification time of the top level subdirectories; you want a copy of all recently modified subdirectories that still exist on the source. Then what you want is this:

cd /data/prometheus/metrics2
find * -maxdepth 0 -mtime -365 -print |
 sed 's;^;/;' |
  rsync -a --delete --delete-excluded \
        --include-from - --exclude '/*' \
        . backupserv:/data/prometheus/metrics2/

Here, the 'find' prints everything in the top level directory that's been modified within the last year. The 'sed' takes that list of names and sticks a '/' on the front, turning names like 'wal' into '/wal', because to rsync this definitely anchors them to the root of the directory tree being (recursively) transferred (per rsync's Pattern Matching Rules and Anchoring Include/Exclude Patterns). Finally, the rsync command says to delete now-gone things in directories we transfer, delete things that are excluded on the source but present on the destination, include what to copy from standard input (ie, our 'sed'), and then exclude everything that isn't specifically included.

(All of this is easier than I expected when I wrote my recent entry on discovering this problem; I thought I might have to either construct elaborate command line arguments or write some temporary files. That --include-from will read from standard input is very helpful here.)

If you don't think to check the rsync manual page, especially its section on Filter Rules, you can have a little rsync accident because you absently think that rsync is 'last match wins' instead of 'first match wins' and put the --exclude before the --include-from. This causes everything to be excluded, and rsync will dutifully delete the entire multi-terabyte copy you made in your earlier testing, because that's what you told it to do when you used --delete-excluded.

(In general I should have carefully read all of the rsync manual page's various sections on pattern matching and filtering. It probably would have saved me time, and it would definitely have left me better informed about how rsync actually behaves.)

sysadmin/RsyncRecentDirectoryContents written at 23:08:38; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.