Using rsync to pull a directory tree to client machines
Suppose that you have a decent sized directory tree that you want some number of clients to mirror
from a master server (with the clients pulling updates instead of the
master pushing them), perhaps because you've just noticed undesired
NFS dependencies. Things in the directory
tree are potentially sensitive (so you want access control), it's updated
at random, and it's not in a giant VCS tree or something; this is your
typical medium-sized ball of local stuff. The straightforward brute
force approach is to use rsync with SSH; give the clients special SSH
identities, put them in the server's authorized_keys, and have them
run 'rsync -a --delete
' (or some close variant) to pull the directory
tree over. However, this has the problem that normal rsync is symmetric;
if you allow a client to pull from you, you also allow a client to push
to you (assuming that the server side login has write access to the
directory tree, and yes let's make that assumption for now).
(You also have to set the SSH access up so that the clients can't run arbitrary commands on the server.)
Rsync's solution to this is its daemon mode, which can restricted to operate in read only mode. Normally rsync wants to be run this way as an actual daemon (listening on a port and so on), but that requires us to use rsync's weaker and harder to manage authentication, access control, and other things. I would rather continue to run daemon mode rsync over plain SSH and take advantage of all of the existing, proven SSH features for various things.
(The rsync manpage suggests hacks like binding the rsync daemon to only listen on localhost on the server and then using SSH port forwarding to give clients access to it. But those are hacks and require making various assumptions.)
How to to do this is not obvious from the documentation, so here is
the setup I have come up with for doing this on both the server and
the clients. First, you need an rsyncd.conf
configuration file on
the server. Don't use the normal /etc/rsyncd.conf
; it's much more
controllable to use your own in a different place. It should look
something like:
use chroot = no[somepath] comment = Replication module path = /some/path read only = true # if necessary: uid = 0 gid = 0
(The '[somepath]
' bit is what rsync calls the module name and can be
anything meaningful for you; you'll need it on the client later. The
comment is optional but potentially useful. You need to explicitly
specify uid
and gid
if the server login is UID 0 for access to the
directory tree and you need to keep that; otherwise rsync
will drop
privileges to a default UID.)
Next, you need a script on the server that will force an incoming SSH
login to run rsync in daemon mode against this configuration file and do
nothing else. We will set this as the command=
value in the server
login's authorized_keys to restrict what the incoming SSH connection
from clients can do. This looks like:
#!/bin/sh
exec /usr/bin/rsync --server --daemon --config=/your/rsyncd.conf .
Note that this completely ignores any arguments that the client attempts
to supply. However, this doesn't matter; as far as I can tell, the
command line that the clients send will always be 'rsync --server
--daemon .
', regardless of what command line options and paths you use
on the clients. (Certainly this is the only command line that clients
seem to send for requests that you actually want to pay attention to.)
On the server, the login that you're using for this should have
a .ssh/authorized_keys
file with entries for the client SSH
identities. These entries should all force incoming logins to run
the command above and block various other activities (especially
port forwarding, which could otherwise be done without command
execution at all as Dan Astoorian mentioned in a comment here):
command="/your/rsyncd-shell",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty [...]
A from="..."
restriction is optional but potentially recommended. Even a broad one may limit the fallout from
problems.
Finally, on the client you need to run rsync with all of the necessary arguments. You probably want to put this in a script:
#!/bin/sh
rsync -a --delete --rsh="/usr/bin/ssh -i /client/identity" LOGIN@MASTER-HOST::somepath /some/path/
Potentially useful additional arguments for rsync
are -q
and
--timeout=<something>
. In a production script you probably also
want an option to mirror the directory tree to somewhere other than
/some/path
on the client.
If you run this from cron, remember to add some locking to prevent two copies from running at once. If the directory tree is large and you have enough clients, you may want to add some amount of randomization of the start times for the replication in order to keep load down on the master server.
(There may be a better way to do this with rsync; if you know of one, let me know in the comments. For various reasons we're probably not interested in doing this with any other tool, partly because we already have rsync and not the other tools. Another tool would have to be very much better than rsync to really be worth switching to.)
|
|