Rsync eng: Difference between revisions

From ICO wiki test
Jump to navigationJump to search
Eocakovs (talk | contribs)
Eocakovs (talk | contribs)
 
(33 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Rsync =
== Summary ==
== Summary ==


Line 9: Line 7:
In short - you can use Rsync to make backups, mirror file systems or any number of similar operations in a fast and secure way.
In short - you can use Rsync to make backups, mirror file systems or any number of similar operations in a fast and secure way.


How can it transfer only file differences you ask? Well that is in the hart of what Rsync does, it has its own algorithm that accomplishes that, section [[Rsync_eng#How it works?|How it works?]] will hopefully provide some answers.
How can it transfer only file differences you ask? It has its own algorithm that accomplishes that, section [[Rsync_eng#How it works?|How it works?]] will hopefully provide some answers.


=== Key features === <ref name="rsync man" />
=== Key features <ref name="rsync man" /> ===


* Support for copying links, devices, owners, groups and permissions
* Support for copying links, devices, owners, groups and permissions
Line 77: Line 75:
== Setup ==
== Setup ==


;Setup will not cover Windows machines''' If you are interested in setting up Rsync vis SSH on Windows is suggest looking at [https://itefix.net/free itefix] solutions for installing SSH and Rsync.
;Setup will not cover Windows machines''' If you are interested in setting up Rsync vis SSH on Windows I suggest looking at [https://itefix.net/free itefix] solutions for installing SSH and Rsync.




Line 96: Line 94:


* Debian based machine
* Debian based machine
  $sudo apt install rsync
<pre>sudo apt install rsync</pre>
* Fedora machine
* Fedora machine
  $sudo yum install rsync
<pre>sudo dnf install rsync</pre>
* OpenSUSE
* OpenSUSE
  $sudo zypper install rsync
<pre>sudo zypper install rsync</pre>


=== Daemon mode ===
=== Daemon mode ===
Line 110: Line 108:
This way of using Rsync is beneficial if you don't want or need the client to specify the file transfer options and path, since you can explicitly specify what and how can be transfered to or from remote machine.
This way of using Rsync is beneficial if you don't want or need the client to specify the file transfer options and path, since you can explicitly specify what and how can be transfered to or from remote machine.


First follow the steps listed in [[Rsync_eng#Setup:Local-only|Local-only]] subsection on '''all''' machines involved in the transfer process.
First follow the steps listed in [[Rsync_eng#Setup|Setup]] Local-only subsection on '''all''' machines involved in the transfer process.


When Rsync is installed you need to set up Rsync in a daemon mode on at least one of the machines involved in data transfer. For that we will set up a configuration file on that machine.
When Rsync is installed you need to set up Rsync in a daemon mode on at least one of the machines involved in data transfer. For that we will set up a configuration file on that machine.
Line 158: Line 156:
<li><p>All the paths listed in <code>/etc/rsyncd.conf</code> need to be accessible by Rsync.</p></li>
<li><p>All the paths listed in <code>/etc/rsyncd.conf</code> need to be accessible by Rsync.</p></li>
<li><p>It is important that <code>rsyncd.scrt</code> file must be accessible only to user that runs Rsync daemon, because it contains user name and password information. I would suggest using the following commands:</p>
<li><p>It is important that <code>rsyncd.scrt</code> file must be accessible only to user that runs Rsync daemon, because it contains user name and password information. I would suggest using the following commands:</p>
<p><code>sudo su - rsyncuser</code></p>
<p><pre>sudo su - rsyncuser</pre></p>
<p><code>sudo chmod 600 /path/to/rsyncd.scrt</code></p></li>
<p><pre>sudo chmod 600 /path/to/rsyncd.scrt</pre></p></li>
<li><p>Rsync daemon usually listens to TCP port 873, so don't forget to white-list it in you firewall.</p></li></ul>
<li><p>Rsync daemon usually listens to TCP port 873, so don't forget to white-list it in you firewall.</p></li></ul>


Line 172: Line 170:
That is why I suggest using [https://kimmo.suominen.com/docs/SSH/ SSH] as your remote shell with Rsync.
That is why I suggest using [https://kimmo.suominen.com/docs/SSH/ SSH] as your remote shell with Rsync.


Before dealing with SSH follow the steps listed in [[Rsync_eng#Setup#Local-only|Local-only]] subsection on all machines involved in the transfer process. When that is done, follow the steps below.
Before dealing with SSH follow the steps listed in [[Rsync_eng#Setup|Setup]] Local-only subsection on all machines involved in the transfer process. When that is done, follow the steps below.


;First you need to makes sure you have SSH client installed.
;First you need to makes sure you have SSH client installed.
:On Unix like machine you could do the following:
:On Unix like machine you could do the following:
:<pre>which ssh</pre>
:<pre>which ssh</pre>
;You have SSH server installed and running on the server machine.
;And you have SSH server installed and running on the server machine.
:You can do it with the following command:
:You can do it with the following command:
:<pre>which sshd</pre>
:<pre>which sshd</pre>
:If the the output shows path to SSH and SSHd then you can skip this step.
:If the the output shows path to ssh and sshd then you can skip this step.
:Otherwise use following commands.
:Otherwise use following commands.
:* Debian based machine
:* Debian based machine
:<pre>sudo apt install openSSH-server openSSH-client</pre>
:<pre>sudo apt install ssh</pre>
:For Fedora and OpenSUSE machines use <code>yum</code> and <code>zypper</code> respectively.
:This will install latest SSH client and server, you can also specify individual pacages, like shown below.
:
:<pre>sudo apt install openssh-server openssh-client openssh-blacklist*</pre>
:*For Fedora and OpenSUSE machines use <code>dnf</code> and <code>zypper</code> respectively.
 
;At this point you should have both Rsync and SSH on all machines involved in the data transfer.
;At this point you should have both Rsync and SSH on all machines involved in the data transfer.
:I will not go trough full SSH setup, for that please see [http://troy.jdmz.net/rsync/index.html this] link.
 
    I will not go trough full SSH setup, for that please see [http://troy.jdmz.net/rsync/index.html this] link.
 
    Additionally here are two great articles that will explain SSH more in depth:
    [https://wiki.itcollege.ee/index.php/SSH_for_beginners] SSH for beginners by Etienne Barrier.
    [https://wiki.itcollege.ee/index.php/SSH_Encryption] SSH Encryption by Frank Korving.
 
:The basics go like this:
:The basics go like this:
:* Start up sshd process on server
:* Start up sshd process on server
:<pre>sudo /etc/init.d/SSH start</pre>
:<pre>sudo /etc/init.d/ssh start</pre>
:or
:or
:<pre>sudo service SSH start</pre>
:<pre>sudo service ssh start</pre>
:*Generate keys on both machines
:*Generate keys on both machines. Leave the passphrase empty if you would like to use this authentication method for automated scripts.
:<pre>SSH-keygen</pre>
:<pre>ssh-keygen -t rsa -b 4096 -a 1000 -f ~/.ssh/key_file</pre>
:This will generate an RSA private key <code>~/.ssh/key_file</code> and public key <code>~/.ssh/key_file.pub</code>.
 
:* Copy public key from client to server
:* Copy public key from client to server
:<pre>SSH-copy-id -i ~/.ssh/rsa_key.pub user@IP-address</pre>
:<pre>ssh-copy-id -i ~/.ssh/key_file user@IP-address</pre>


When Rsync is used via SSH you can still use Rsync in daemon mode to use preconfigure modules, see subsection [[Rsync_eng#Setup#Daemon mode|Daemon mode]]. In that case only the usage syntax will change as specified in [[Rsync_eng#Usage|Usage]] section.
:Please remeber that file permissions for ~/.ssh directory and its subfiles needs to be strict, to make sure of that run the following commands:
:<pre>chmod 700 ~/.ssh/</pre>
:<pre>chmod 600 ~/.ssh/*</pre>
 
When Rsync is used via SSH you can still use Rsync in daemon mode to use preconfigure modules, see subsection Daemon mode of [[Rsync_eng#Setup|Setup]]. In that case only the usage syntax will change as specified in [[Rsync_eng#Usage|Usage]] section.


== Synopsis ==
== Synopsis ==
Line 271: Line 283:


<pre>rsync -tavz *.c username@10.0.2.20::module_name</pre>
<pre>rsync -tavz *.c username@10.0.2.20::module_name</pre>
As you can see in daemon mode we specify remote destination ip followed by <code>::</code> and the module we want to use, as per installation instructions module contains all the configuration options. And note that <code>username</code> is a user specified in <code>/path/to/rsyncd.scrt</code>.


* Remote shell
* Remote shell


<pre>rsync -tavz -e 'ssh -i /path/to/private_key.pem' *.c user@10.0.2.20:src/</pre>
<pre>rsync -tavz -e 'ssh -i /path/to/private_key' *.c user@10.0.2.20:src/</pre>


As you can see in daemon mode we specify remote destination ip followed by <code>::</code> and the module we want to use, as per installation instructions module contains all the configuration options. And note that <code>username</code> is a user specified in <code>/path/to/rsyncd.scrt</code>.
<code>/path/to/private_key</code> will usually be <code>~/.ssh/id_rsa</code> if you generated a key pair using <code>ssh-keygen -f ~/.ssh/id_rsa -t rsa -b 4096</code>


In remote shell mode we specify remote shell via -e option and SSH <code>user</code> followed by <code>@</code> and ip of the destination followed by <code>:</code> and destination path.
In remote shell mode we specify remote shell via -e option and SSH <code>user</code> followed by <code>@</code> and ip of the destination followed by <code>:</code> and destination path.
Line 282: Line 296:
A bit more complex sample using ssh from remote source to local destination.
A bit more complex sample using ssh from remote source to local destination.


<pre>rsync -avz -e 'ssh -i /path/to/pemfile.pem' user@10.0.2.20:/file{1,2} :/file3 user@172.16.1.10:/file4 /path/to/destination</pre>
<pre>rsync -avz -e 'ssh -i /path/to/private_key' user@10.0.2.20:/file{1,2} :/file3 user@172.16.1.10:/file4 /path/to/destination</pre>
This would transfer files - file1, file2, file3 from <code>10.0.2.20</code> and file4 from <code>172.16.1.10</code> to /path/to/destination on local machine. It shows the versatility of Rsync, you can specify several individual files on remote host - <code>:/file1 :/file2</code> or you can specify a pattern <code>/file{1,2}</code>. Also you can specify several remote hosts if the ssh key you provided will match public key on remote machines.
This would transfer files - file1, file2, file3 from <code>10.0.2.20</code> and file4 from <code>172.16.1.10</code> to /path/to/destination on local machine. It shows the versatility of Rsync, you can specify several individual files on remote host - <code>:/file1 :/file2</code> or you can specify a pattern <code>/file{1,2}</code>. Also you can specify several remote hosts if the ssh key you provided will match public key on remote machines.


Line 290: Line 304:
Directory copy works the same as in local-only mode only difference is that you have to specify remote host:
Directory copy works the same as in local-only mode only difference is that you have to specify remote host:


<pre>rsync -avz -e 'ssh -i /path/to/pemfile.pem' user@10.0.2.20:/path/to/source /path/to/destination</pre>
<pre>rsync -avz -e 'ssh -i /path/to/private_key' user@10.0.2.20:/path/to/source /path/to/destination</pre>


or
or
Line 335: Line 349:
:Depending witch option you chose you would write something like this in to script file:
:Depending witch option you chose you would write something like this in to script file:
<source lang="bash">#!/bin/bash
<source lang="bash">#!/bin/bash
rsync -avz -e 'ssh -i /path/to/pemfile.pem' user@10.0.2.20:/path/to/source /path/to/destination</source>
rsync -avz -e 'ssh -i /path/to/private_key' user@10.0.2.20:/path/to/source /path/to/destination</source>
:And something like this in to Makefile:
:And something like this in to Makefile:
<source lang="make">backup:
<source lang="make">backup:
     rsync -avz -e 'ssh -i /path/to/pemfile.pem' user@10.0.2.20:/path/to/source /path/to/destination</source>
     rsync -avz -e 'ssh -i /path/to/private_key' user@10.0.2.20:/path/to/source /path/to/destination</source>
:Then you would edit your Crontab like so:
:Then you would edit your Crontab like so:
:<pre>crontab -e</pre>
:<pre>crontab -e</pre>
Line 516: Line 530:
:This parameter allows you to specify the maximum number of simultaneous connections you will allow. Any clients connecting when the maximum has been reached will receive a message telling them to try later. The default is 0, which means no limit. A negative value disables the module. See also the &quot;lock file&quot; parameter.
:This parameter allows you to specify the maximum number of simultaneous connections you will allow. Any clients connecting when the maximum has been reached will receive a message telling them to try later. The default is 0, which means no limit. A negative value disables the module. See also the &quot;lock file&quot; parameter.


= Author =
;Eriks Ocakovskis C11, Estonian IT College, 07-05-2017


= References =
= References =

Latest revision as of 09:55, 8 May 2017

Summary

Rsync is a command line tool used to copy files locally and over the network. To quote the official website: "Rsync - a fast, versatile, remote (and local) file-copying tool" [1]. The main draw of Rsync is that it tries to copy only differences between files and not the entire file. Which in turn reduces the data traffic on the network and time spent. This is great, if you ever have tried to make efficient backups over network you will appreciate what authors of Rsync have done. Not only that, Rsync incorporates data compression for even grater time and data transfer efficiency.

But wait, there is more - Rsync has built in ability to copy links, devices, owners, groups and permissions. It can be tunneled via SSH. And supports two way transfer.

In short - you can use Rsync to make backups, mirror file systems or any number of similar operations in a fast and secure way.

How can it transfer only file differences you ask? It has its own algorithm that accomplishes that, section How it works? will hopefully provide some answers.

Key features [1]

  • Support for copying links, devices, owners, groups and permissions
  • Exclude and exclude-from options similar to GNU tar
  • A CVS exclude mode for ignoring the same files that CVS would ignore
  • Does not require root privileges
  • Pipelining of file transfers to minimize latency costs
  • Support for anonymous or authenticated Rsync servers (ideal for mirroring)

Table of content

How it works?

The way Rsync works is that when an Rsync client is started it will first establish a connection with a server process. This connection may be through pipes or over a network socket.

  • Via remote shell

When Rsync communicates with a remote non-daemon server via a remote shell both the Rsync client and server are communicating via pipes through the remote shell. As far as the Rsync processes are concerned there is no network. In this mode the Rsync options for the server process are passed on the command-line that is used to start the remote shell. [2]

  • Daemon mode

Network socket is used when Rsync is communicating with a daemon. This is the only sort of Rsync communication that could be called network aware. In this mode the Rsync options must be sent over the socket. [2]

  • Local-only

When Rsync is preforming a local only job the client will fork a server process to become both sender and receiver.


At the very start of the connection client and server agree on communication protocol version by sending their protocol versions to each other and the minimum version value is used. After connection has been established the side that will be sending the files start generating file list. While file list is being generated each entry in the list is sent to the receiving end in compressed format. When file list is generated both sides sort the file list in alphabetical order.

After both sides have the file list sorted the receiving side will run the generator process, it will compare the file list with its local directory tree. During this comparison each file will be checked if it can be skipped, it will never skip directories, device nodes and symlinks. Also missing directories will be created. When generator process determents that a file is not to be skipped that files original version on receiving end will be considered as data source for the transfer and will be used to eliminate the need to transfer already existing data. Elimination process will be made by taking several block checksum and index checksum pairs of the original file (block checksum size and amount is dependent on size of the file). Each checksum pair is then sent to the data sender (sending side).

When sending side receives the checksums of a file it will generate a hash-table index by index checksums of the original file. Then the local file is read and a index checksum is generated for the block beginning with the first byte of the local file. This index checksum is then compared to hash-table that was previously generated, if a match is found then a block checksum is generated of the local file from the same byte, and if no match is found, the non-matching byte will be appended to the non-matching data and the block starting at the next byte will be compared. This is what is referred to as the “rolling checksum” [2].

All the matched an non-matching data is sent to the receiving side. The important part here is that for matched data only block and index checksums are sent, with requires very little network utilization, only for non-matching data checksums and data is sent. Non-matching data will be sent to the receiver followed by the offset and length in the original file of the matching block and the block checksum generator will be advanced to the next byte after the matching block. Matching blocks can be identified in this way even if the blocks are reordered or at different offsets [2].

In this way, the sender will give the receiver instructions for how to reconstruct the source file into a new destination file. These instructions detail all the matching data that can be copied from the basis file (if one exists for the transfer), and includes any raw data that was not available locally. At the end of each file's processing a whole-file checksum is sent and the sender proceeds with the next file. Generating the rolling checksums and searching for matches in the checksum set sent by the receiving side require a good deal of CPU power. Of all the rsync processes it is the sender that is the most CPU intensive.

The receiver will read from the sender data for each file identified by the index checksum. It will open the local file (called the basis) and will create a temporary file.

The receiver will expect to read non-matched data and/or to match records all in sequence for the final file contents. When non-matched data is read it will be written to the temp-file. When a block match record is received the receiver will seek to the block offset in the basis file and copy the block to the temp-file. In this way the temp-file is built from beginning to end.

The file's checksum is generated as the temp-file is built. At the end of the file, this checksum is compared with the file checksum from the sender. If the file checksums do not match the temp-file is deleted. If the file fails once it will be reprocessed in a second phase, and if it fails twice an error is reported.

After the temp-file has been completed, its ownership and permissions and modification time are set. It is then renamed to replace the basis file.

Copying data from the basis file to the temp-file make the receiver the most disk intensive of all the rsync processes. Small files may still be in disk cache mitigating this but for large files the cache may thrash as the generator has moved on to other files and there is further latency caused by the sender. As data is read possibly at random from one file and written to another, if the working set is larger than the disk cache, then what is called a seek storm can occur, further hurting performance [2].

If you are interested how well Rsync algorithm preforms I suggest you read Multiround Rsync by John Langford. He compares Rsync algorithm to his own improved version and by doing that has provided quite detailed analysis of Rsync algorithm performance.

Quick start guide

For people who are in a hurry.

Debian based machine and Rsync in local-only mode

Open terminal and paste the following

sudo apt install rsync
rsync -av ~/ /tmp/my_local_backup

Congratulations! you have made a backup of your home directory.

Setup

Setup will not cover Windows machines If you are interested in setting up Rsync vis SSH on Windows I suggest looking at itefix solutions for installing SSH and Rsync.


As mentioned in the beginning of How it works? section there are several ways Rsync can be used - in a daemon mode, via remote shell or locally.

Each of these cases will require slightly different setup process. Because of that reason I have split up setup in 3 subsections for each use case.

Local-only

In this case you will need only Rsync itself and all the transfer parameters will be passed to Rsync itself as command line options.

First check if you have Rsync already installed by running:

which rsync

If the the output returns a path to rsync then you are all done and can skip directly to Usage section.

Otherwise depending on your system run the following commands:

  • Debian based machine
sudo apt install rsync
  • Fedora machine
sudo dnf install rsync
  • OpenSUSE
sudo zypper install rsync

Daemon mode

In this case, the way Rsync will work is that at least one of the machines involved in data transfer needs to be an "rsync server" by running Rsync in a daemon mode (rsync --daemon at the command line) and setting up a short, easy configuration file (/etc/rsyncd.conf) [3].

Any number of machines with Rsync installed may then synchronize to and/or from the machine running the Rsync daemon. [3]

This way of using Rsync is beneficial if you don't want or need the client to specify the file transfer options and path, since you can explicitly specify what and how can be transfered to or from remote machine.

First follow the steps listed in Setup Local-only subsection on all machines involved in the transfer process.

When Rsync is installed you need to set up Rsync in a daemon mode on at least one of the machines involved in data transfer. For that we will set up a configuration file on that machine.

Open /etc/rsyncd.conf in your favorite text editor and paste the following:


motd file = /path/to/rsyncd.motd
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock
syslog facility = local5
reverse lookup = no

[no_security_module]
  path = /path/to/files ./pub
  comment = Some files to sync (approx 6.1 GB)

[more_security_module]
  path = /path/to/files
  comment = Some files to sync (approx 10.2 GB)
  uid = nobody
  gid = nobody
  read only = no
  list = yes
  auth users = username, anotheruser
  secrets file = /path/to/rsyncd.scrt
  reverse lookup = yes
  hosts allow = 10.0.1.12, 192.168.0.0/16, fe80::/64, 172.16.143.*
  refuse options = c delete
  max connections = 4
To see detailed explanation of the parameters we pasted to /etc/rsyncd.conf see rsyncd.conf parameters section.


Parameters shown above can be adjured to your personal preference. I have shown 2 different modules, that are marked by square brackets surrounding them. 'no_security_module' module is a very basic one that just mentions a path to be synced and a comment. 'more_security_module' one is more complex and is aimed at securing the connection if secure remote shell is not used.

If you will be using the more secure module then open /path/to/rsyncd.scrt in your favorite text editor and add user names and passwords for users you wrote in auth users parameter. Format for the file is as follows:

username:password
bob:bobspass
sally:herpass

Please note the following:

  • Users that you list in /etc/rsyncd.conf and /path/to/rsyncd.scrt are not your system users, they are arbitrary users that will be used only for Rsync process.

  • All the paths listed in /etc/rsyncd.conf need to be accessible by Rsync.

  • It is important that rsyncd.scrt file must be accessible only to user that runs Rsync daemon, because it contains user name and password information. I would suggest using the following commands:

    sudo su - rsyncuser

    sudo chmod 600 /path/to/rsyncd.scrt

  • Rsync daemon usually listens to TCP port 873, so don't forget to white-list it in you firewall.

After configuration file has been made you can start up Rsync in daemon mode by running rsync --daemon.

Later on if you want the daemon process to be started automatically you can either use inet daemon or place rsync --daemon command in a shell script and add it to startup, both methods have their merit, which one you use is up to you.

Via remote shell

One of the most convenient ways to use Rsync is via remote shell, it will allow you to bypass setting up Rsync built in authentication system. Also It will provide security layer since Rsync authentication protocol is a 128 bit MD4 based challenge response system [4] which is quite poor. On top of that using SSH will provide data encryption.

That is why I suggest using SSH as your remote shell with Rsync.

Before dealing with SSH follow the steps listed in Setup Local-only subsection on all machines involved in the transfer process. When that is done, follow the steps below.

First you need to makes sure you have SSH client installed.
On Unix like machine you could do the following:
which ssh
And you have SSH server installed and running on the server machine.
You can do it with the following command:
which sshd
If the the output shows path to ssh and sshd then you can skip this step.
Otherwise use following commands.
  • Debian based machine
sudo apt install ssh
This will install latest SSH client and server, you can also specify individual pacages, like shown below.
sudo apt install openssh-server openssh-client openssh-blacklist*
  • For Fedora and OpenSUSE machines use dnf and zypper respectively.
At this point you should have both Rsync and SSH on all machines involved in the data transfer.
   I will not go trough full SSH setup, for that please see this link.
   Additionally here are two great articles that will explain SSH more in depth:
   [1] SSH for beginners by Etienne Barrier.
   [2] SSH Encryption by Frank Korving.
The basics go like this:
  • Start up sshd process on server
sudo /etc/init.d/ssh start
or
sudo service ssh start
  • Generate keys on both machines. Leave the passphrase empty if you would like to use this authentication method for automated scripts.
ssh-keygen -t rsa -b 4096 -a 1000 -f ~/.ssh/key_file
This will generate an RSA private key ~/.ssh/key_file and public key ~/.ssh/key_file.pub.
  • Copy public key from client to server
ssh-copy-id -i ~/.ssh/key_file user@IP-address
Please remeber that file permissions for ~/.ssh directory and its subfiles needs to be strict, to make sure of that run the following commands:
chmod 700 ~/.ssh/
chmod 600 ~/.ssh/*

When Rsync is used via SSH you can still use Rsync in daemon mode to use preconfigure modules, see subsection Daemon mode of Setup. In that case only the usage syntax will change as specified in Usage section.

Synopsis

Local: rsync [OPTION...] SRC... [DEST]

Access via remote shell:
Pull: rsync [OPTION...] [USER@]HOST:SRC... [DEST]
Push: rsync [OPTION...] SRC... [USER@]HOST:DEST

Access via rsync daemon:
Pull: rsync [OPTION...] [USER@]HOST::SRC... [DEST] rsync [OPTION...] rsync://[USER@]HOST[:PORT]/SRC... [DEST]
Push: rsync [OPTION...] SRC... [USER@]HOST::DEST rsync [OPTION...] SRC... rsync://[USER@]HOST[:PORT]/DEST

Usages with just one SRC arg and no DEST arg will list the source files instead of copying [1].

Usage

You use Rsync in the same way you use rcp. You must specify a source and a destination, one of which may be remote [1].

All the Rsync command line options used below are explained in Command options section.

Local-only

Rsync can be used to copy files to remote detestation or locally. In case of local-only use it behaves like an improved copy command.

Command sample:

rsync -t *.c src/

This would transfer all files matching the pattern *.c from the current directory to the directory src. If any of the files already exist on the remote system then the Rsync remote-update protocol is used to update the file by sending only the differences [1].

Another way is to specify the directory you would like to copy. It can be done in two ways. One is by including trailing slash in the source path, like so:

rsync -av /path/to/source/ /path/to/destination

This will copy all the content of source directory to destination directory.

Second option is to omit the trailing slash, like so:

rsync -av /path/to/source /path/to/destination

This would will copy all the content of source directory, including the directory itself to destination directory, so in the end we will have the content of source in this path - /path/to/destination/source.

To copy directories or files that include white spaces surround them with ' or in newer versions of Rsync use the --protect-args (-s) option.

rsync '/file with spaces' /path/to/destination

rsync -s /file with spaces /path/to/destination

If you would like to transfer several files from the source to the same destination you would do something like this:

rsync -av /file1 /file2 /path/to/destination

Remote source or destination

When remote source or destination is used a way of contacting remote system must be specified. There are two ways:

  • Using a remote-shell program as the transport (such as SSH). When using this method source or destination path contains a single colon (:) separator after a host specification.
  • Contacting an Rsync daemon directly via TCP This happens when the source or destination path contains a double colon (::) separator after a host specification, OR when an rsync:// URL is specified

There is however exception to this rule, if double colon (::) separator syntax is used and remote shell is specified via --rsh (-e) option then a single use daemon will be spawned on remote host that will read its configuration file from specified users home directory. This however is not the best way to secure a daemon transfer. Better way would be using ssh to tunnel a local port to a remote machine and configure a normal rsync daemon on that remote host to only allow connections from "localhost" [1].

For instructions on SSH port forwarding see this.


Local source to remote destination command sample:

rsync -t *.c foo:src/

This the same as local-only sample only it copies files to the directory src on the machine foo.

To expand on previous sample depending on the remote host setup.

  • Daemon mode
rsync -tavz *.c username@10.0.2.20::module_name

As you can see in daemon mode we specify remote destination ip followed by :: and the module we want to use, as per installation instructions module contains all the configuration options. And note that username is a user specified in /path/to/rsyncd.scrt.

  • Remote shell
rsync -tavz -e 'ssh -i /path/to/private_key' *.c user@10.0.2.20:src/

/path/to/private_key will usually be ~/.ssh/id_rsa if you generated a key pair using ssh-keygen -f ~/.ssh/id_rsa -t rsa -b 4096

In remote shell mode we specify remote shell via -e option and SSH user followed by @ and ip of the destination followed by : and destination path.

A bit more complex sample using ssh from remote source to local destination.

rsync -avz -e 'ssh -i /path/to/private_key' user@10.0.2.20:/file{1,2} :/file3 user@172.16.1.10:/file4 /path/to/destination

This would transfer files - file1, file2, file3 from 10.0.2.20 and file4 from 172.16.1.10 to /path/to/destination on local machine. It shows the versatility of Rsync, you can specify several individual files on remote host - :/file1 :/file2 or you can specify a pattern /file{1,2}. Also you can specify several remote hosts if the ssh key you provided will match public key on remote machines.

You can do similarly if transferring in daemon mode:

rsync -avz user@10.0.2.20::module_name/file{1,2} user@172.16.1.10::module_name/file3 /path/to/destination

Directory copy works the same as in local-only mode only difference is that you have to specify remote host:

rsync -avz -e 'ssh -i /path/to/private_key' user@10.0.2.20:/path/to/source /path/to/destination

or

rsync -avz user@10.0.2.20::module_name /path/to/destination

One important thing to note that host and module references don't require a trailing slash to copy the contents of the default directory [1].

Special case

If a single source argument is specified without a destination, the files are listed in an output format similar to ls -l [1].

Tips and tricks

Use filters to exclude or include files.
This is useful if for example you would like to transfer specific pattern and exclude some fies from that pattern or exclude all files but the ones you specify in include rule.
There are several ways of doing it:
  • filter option
rsync -av --filter '- *.tmp' /path/to/source/ /path/to/destination
This would exclude files with *.tmp pattern.
  • exclude / include options
rsync -av --exclude '*.tmp' /path/to/source/ /path/to/destination
This would do the same as above sample.
rsync -av --include '*.txt' /path/to/source/ /path/to/destination
This would include *.txt file pattern, it would be useful only in combination with exclude pattern, because Rsync will transfer all the files anyway if exclude option is not specified.
  • Using exclude / include file
It is a bit more versatile method, because you don't have to clutter your command line options with possibly dozens of filters.
To pass file with filters you would do something like so:
rsync -av --exclude-from '/exclude_file' --include-from '/include_file' /path/to/source/ /path/to/destination
And the files themselves would have the new line separated exclude / include pattern:
*.temp   /some/dir   *.txt   */some/other/dir
Also not that if you use * as exclude rule everything will be excluded, so if you want to exclude everything except specific pattern first specify an include rule something like */ that would tell to include the directory itself
Store a password file on local machine to avoid typing password when transferring daemon mode.
Some modules on the remote daemon may require authentication. If so, you will receive a password prompt when you connect. You can avoid the password prompt by setting the environment variable RSYNC_PASSWORD to the password you want to use or using the --password-file option. This may be useful when scripting rsync.
WARNING: On some systems environment variables are visible to all users. On those systems using --password-file is recommended [1].
Set up scripts or make files together with cron jobs to automate the process.
First you would need to create a script on make file, you can do that for example by opening a file in text editor:
nano ~/backup_files.sh
or
nano ~/Makefile
Depending witch option you chose you would write something like this in to script file:
#!/bin/bash
rsync -avz -e 'ssh -i /path/to/private_key' user@10.0.2.20:/path/to/source /path/to/destination
And something like this in to Makefile:
backup:
    rsync -avz -e 'ssh -i /path/to/private_key' user@10.0.2.20:/path/to/source /path/to/destination
Then you would edit your Crontab like so:
crontab -e
And there you would write something like this for Bash script:
5 0 * * *     $HOME/backup_files.sh

You can find out about Bash, Makefile and Crontab below.

Bash Makefile Crontab

Command options

Rsync options used in this article can be found below.

This section is filtered copy from man [1]

Rsync accepts both long (double-dash + word) and short (single-dash + letter) options.

-a, --archive
This is equivalent to -rlptgoD. It is a quick way of saying you want recursion and want to preserve almost everything (with -H being a notable omission). The only exception to the above equivalence is when --files-from is specified, in which case -r is not implied. Note that -a does not preserve hardlinks, because finding multiply-linked files is expensive. You must separately specify -H.
-v, --verbose
This option increases the amount of information you are given during the transfer. By default, rsync works silently. A single -v will give you information about what files are being transferred and a brief summary at the end. Two -v options will give you information on what files are being skipped and slightly more information at the end. More than two -v options should only be used if you are debugging rsync. In a modern rsync, the -v option is equivalent to the setting of groups of --info and --debug options. You can choose to use these newer options in addition to, or in place of using --verbose, as any fine-grained settings override the implied settings of -v. Both --info and --debug have a way to ask for help that tells you exactly what flags are set for each increase in verbosity.
However, do keep in mind that a daemon's "max verbosity" setting will limit how high of a level the various individual flags can be set on the daemon side. For instance, if the max is 2, then any info and/or debug flag that is set to a higher value than what would be set by -vv will be downgraded to the -vv level in the daemon's logging.
-z, --compress
With this option, rsync compresses the file data as it is sent to the destination machine, which reduces the amount of data being transmitted -- something that is useful over a slow connection. Note that this option typically achieves better compression ratios than can be achieved by using a compressing remote shell or a compressing transport because it takes advantage of the implicit information in the matching data blocks that are not explicitly sent over the connection. This matching-data compression comes at a cost of CPU, though, and can be disabled by repeating the -z option, but only if both sides are at least version 3.1.1.
Note that if your version of rsync was compiled with an external zlib (instead of the zlib that comes packaged with rsync) then it will not support the old-style compression, only the new-style (repeated-option) compression. In the future this new-style compression will likely become the default.
The client rsync requests new-style compression on the server via the --new-compress option, so if you see that option rejected it means that the server is not new enough to support -zz. Rsync also accepts the --old-compress option for a future time when new-style compression becomes the default.
See the --skip-compress option for the default list of file suffixes that will not be compressed.
-t, --times
This tells rsync to transfer modification times along with the files and update them on the remote system. Note that if this option is not used, the optimization that excludes files that have not been modified cannot be effective; in other words, a missing -t or -a will cause the next transfer to behave as if it used -I, causing all files to be updated (though rsync's delta-transfer algorithm will make the update fairly efficient if the files haven't actually changed, you're much better off using -t).
-e, --rsh=COMMAND
This option allows you to choose an alternative remote shell program to use for communication between the local and remote copies of rsync. Typically, rsync is configured to use ssh by default, but you may prefer to use rsh on a local network. If this option is used with [user@]host::module/path, then the remote shell COMMAND will be used to run an rsync daemon on the remote host, and all data will be transmitted through that remote shell connection, rather than through a direct socket connection to a running rsync daemon on the remote host. See the section "USING RSYNC-DAEMON FEATURES VIA A REMOTE-SHELL CONNECTION" above.
Command-line arguments are permitted in COMMAND provided that COMMAND is presented to rsync as a single argument. You must use spaces (not tabs or other whitespace) to separate the command and args from each other, and you can use single- and/or double-quotes to preserve spaces in an argument (but not backslashes). Note that doubling a single-quote inside a single-quoted string gives you a single-quote; likewise for double-quotes (though you need to pay attention to which quotes your shell is parsing and which quotes rsync is parsing). Some examples:
-e 'ssh -p 2234' -e 'ssh -o "ProxyCommand nohup ssh firewall nc -w1 %h %p"'
(Note that ssh users can alternately customize site-specific connect options in their .ssh/config file.)
You can also choose the remote shell program using the RSYNC_RSH environment variable, which accepts the same range of values as -e.
See also the --blocking-io option which is affected by this option.
-f, --filter=RULE
This option allows you to add rules to selectively exclude certain files from the list of files to be transferred. This is most useful in combination with a recursive transfer. You may use as many --filter options on the command line as you like to build up the list of files to exclude. If the filter contains whitespace, be sure to quote it so that the shell gives the rule to rsync as a single argument. The text below also mentions that you can use an underscore to replace the space that separates a rule from its arg.
See the FILTER RULES section for detailed information on this option.
--exclude=PATTERN
This option is a simplified form of the --filter option that defaults to an exclude rule and does not allow the full rule-parsing syntax of normal filter rules. See the FILTER RULES section for detailed information on this option.
--exclude-from=FILE
This option is related to the --exclude option, but it specifies a FILE that contains exclude patterns (one per line). Blank lines in the file and lines starting with ';' or '#' are ignored. If FILE is -, the list will be read from standard input.
--include=PATTERN
This option is a simplified form of the --filter option that defaults to an include rule and does not allow the full rule-parsing syntax of normal filter rules. See the FILTER RULES section for detailed information on this option.
--include-from=FILE
This option is related to the --include option, but it specifies a FILE that contains include patterns (one per line). Blank lines in the file and lines starting with ';' or '#' are ignored. If FILE is -, the list will be read from standard input.

All of the options can be found on Rsync man page

rsyncd.conf parameters

Rsyncd.conf parameters used in this article can be found below.

This scetion is filtered copy of Rsync daemon man [4]

Configuration file consists of modules and parameters. A module begins with the name of the module in square brackets and continues until the next module begins. Modules contain parameters of the form "name = value". The first parameters in the file (before a [module] header) are the global parameters. ref

A useful option is to use references to environment variables in the values of parameters. Like so uid = %RSYNC_USER_NAME% where RSYNC_USER_NAME is an environment variable.

motd file
This parameter allows you to specify a "message of the day" to display to clients on each connect. This usually contains site information and any legal notices. The default is no motd file. This can be overridden by the --dparam=motdfile=FILE command-line option when starting the daemon.
log file
When the "log file" parameter is set to a non-empty string, the rsync daemon will log messages to the indicated file rather than using syslog. This is particularly useful on systems (such as AIX) where syslog() doesn't work for chrooted programs. The file is opened before chroot() is called, allowing it to be placed outside the transfer. If this value is set on a per-module basis instead of globally, the global log will still contain any authorization failures or config-file error messages. If the daemon fails to open the specified file, it will fall back to using syslog and output an error about the failure. (Note that the failure to open the specified log file used to be a fatal error.)
This setting can be overridden by using the --log-file=FILE or --dparam=logfile=FILE command-line options. The former overrides all the log-file parameters of the daemon and all module settings. The latter sets the daemon's log file and the default for all the modules, which still allows modules to override the default setting.
pid file
This parameter tells the rsync daemon to write its process ID to that file. If the file already exists, the rsync daemon will abort rather than overwrite the file. This can be overridden by the --dparam=pidfile=FILE command-line option when starting the daemon.
lock file
This parameter specifies the file to use to support the "max connections" parameter. The rsync daemon uses record locking on this file to ensure that the max connections limit is not exceeded for the modules sharing the lock file. The default is /var/run/rsyncd.lock.
syslog facility
This parameter allows you to specify the syslog facility name to use when logging messages from the rsync daemon. You may use any standard syslog facility name which is defined on your system. Common names are auth, authpriv, cron, daemon, ftp, kern, lpr, mail, news, security, syslog, user, uucp, local0, local1, local2, local3, local4, local5, local6 and local7. The default is daemon. This setting has no effect if the "log file" setting is a non-empty string (either set in the per-modules settings, or inherited from the global settings).
reverse lookup
Controls whether the daemon performs a reverse lookup on the client's IP address to determine its hostname, which is used for "hosts allow"/"hosts deny" checks and the "%h" log escape. This is enabled by default, but you may wish to disable it to save time if you know the lookup will not return a useful result, in which case the daemon will use the name "UNDETERMINED" instead. If this parameter is enabled globally (even by default), rsync performs the lookup as soon as a client connects, so disabling it for a module will not avoid the lookup. Thus, you probably want to disable it globally and then enable it for modules that need the information.
path
This parameter specifies the directory in the daemon's filesystem to make available in this module. You must specify this parameter for each module in rsyncd.conf. You may base the path's value off of an environment variable by surrounding the variable name with percent signs. You can even reference a variable that is set by rsync when the user connects. For example, this would use the authorizing user's name in the path:
  path = /home/%RSYNC_USER_NAME%
It is fine if the path includes internal spaces -- they will be retained verbatim (which means that you shouldn't try to escape them). If your final directory has a trailing space (and this is somehow not something you wish to fix), append a trailing slash to the path to avoid losing the trailing whitespace.
comment
This parameter specifies a description string that is displayed next to the module name when clients obtain a list of available modules. The default is no comment.
uid
This parameter specifies the user name or user ID that file transfers to and from that module should take place as when the daemon was run as root. In combination with the "gid" parameter this determines what file permissions are available. The default when run by a super-user is to switch to the system's "nobody" user. The default for a non-super-user is to not try to change the user. See also the "gid" parameter. The RSYNC_USER_NAME environment variable may be used to request that rsync run as the authorizing user. For example, if you want a rsync to run as the same user that was received for the rsync authentication, this setup is useful:
uid = %RSYNC_USER_NAME%
gid = *
gid
This parameter specifies one or more group names/IDs that will be used when accessing the module. The first one will be the default group, and any extra ones be set as supplemental groups. You may also specify a "*" as the first gid in the list, which will be replaced by all the normal groups for the transfer's user (see "uid"). The default when run by a super-user is to switch to your OS's "nobody" (or perhaps "nogroup") group with no other supplementary groups. The default for a non-super-user is to not change any group attributes (and indeed, your OS may not allow a non-super-user to try to change their group settings).
read only
This parameter determines whether clients will be able to upload files or not. If "read only" is true then any attempted uploads will fail. If "read only" is false then uploads will be possible if file permissions on the daemon side allow them. The default is for all modules to be read only. Note that "auth users" can override this setting on a per-user basis.
list
This parameter determines whether this module is listed when the client asks for a listing of available modules. In addition, if this is false, the daemon will pretend the module does not exist when a client denied by "hosts allow" or "hosts deny" attempts to access it. Realize that if "reverse lookup" is disabled globally but enabled for the module, the resulting reverse lookup to a potentially client-controlled DNS server may still reveal to the client that it hit an existing module. The default is for modules to be listable.
auth users
This parameter specifies a comma and/or space-separated list of authorization rules. In its simplest form, you list the usernames that will be allowed to connect to this module. The usernames do not need to exist on the local system. The rules may contain shell wildcard characters that will be matched against the username provided by the client for authentication. If "auth users" is set then the client will be challenged to supply a username and password to connect to the module. A challenge response authentication protocol is used for this exchange. The plain text usernames and passwords are stored in the file specified by the "secrets file" parameter. The default is for all users to be able to connect without a password (this is called "anonymous rsync"). In addition to username matching, you can specify groupname matching via a '@' prefix. When using groupname matching, the authenticating username must be a real user on the system, or it will be assumed to be a member of no groups. For example, specifying "@rsync" will match the authenticating user if the named user is a member of the rsync group.
Finally, options may be specified after a colon (:). The options allow you to "deny" a user or a group, set the access to "ro" (read-only), or set the access to "rw" (read/write). Setting an auth-rule-specific ro/rw setting overrides the module's "read only" setting.
Be sure to put the rules in the order you want them to be matched, because the checking stops at the first matching user or group, and that is the only auth that is checked. For example:
auth users = joe:deny @guest:deny admin:rw @rsync:ro susan joe sam
In the above rule, user joe will be denied access no matter what. Any user that is in the group "guest" is also denied access. The user "admin" gets access in read/write mode, but only if the admin user is not in group "guest" (because the admin user-matching rule would never be reached if the user is in group "guest"). Any other user who is in group "rsync" will get read-only access. Finally, users susan, joe, and sam get the ro/rw setting of the module, but only if the user didn't match an earlier group-matching rule.
If you need to specify a user or group name with a space in it, start your list with a comma to indicate that the list should only be split on commas (though leading and trailing whitespace will also be removed, and empty entries are just ignored). For example:
auth users = , joe:deny, @Some Group:deny, admin:rw, @RO Group:ro
See the description of the secrets file for how you can have per-user passwords as well as per-group passwords. It also explains how a user can authenticate using their user password or (when applicable) a group password, depending on what rule is being authenticated.
See also the section entitled "USING RSYNC-DAEMON FEATURES VIA A REMOTE SHELL CONNECTION" in rsync for information on how handle an rsyncd.conf-level username that differs from the remote-shell-level username when using a remote shell to connect to an rsync daemon.
secrets file
This parameter specifies the name of a file that contains the username:password and/or @groupname:password pairs used for authenticating this module. This file is only consulted if the "auth users" parameter is specified. The file is line-based and contains one name:password pair per line. Any line has a hash (#) as the very first character on the line is considered a comment and is skipped. The passwords can contain any characters but be warned that many operating systems limit the length of passwords that can be typed at the client end, so you may find that passwords longer than 8 characters don't work. The use of group-specific lines are only relevant when the module is being authorized using a matching "@groupname" rule. When that happens, the user can be authorized via either their "username:password" line or the "@groupname:password" line for the group that triggered the authentication.
It is up to you what kind of password entries you want to include, either users, groups, or both. The use of group rules in "auth users" does not require that you specify a group password if you do not want to use shared passwords.
There is no default for the "secrets file" parameter, you must choose a name (such as /etc/rsyncd.secrets). The file must normally not be readable by "other"; see "strict modes". If the file is not found or is rejected, no logins for a "user auth" module will be possible.
hosts allow
This parameter allows you to specify a list of comma- and/or whitespace-separated patterns that are matched against a connecting client's hostname and IP address. If none of the patterns match, then the connection is rejected. Each pattern can be in one of five forms:
  • a dotted decimal IPv4 address of the form a.b.c.d, or an IPv6 address of the form a:b:c::d:e:f. In this case the incoming machine's IP address must match exactly.
  • an address/mask in the form ipaddr/n where ipaddr is the IP address and n is the number of one bits in the netmask. All IP addresses which match the masked IP address will be allowed in.
  • an address/mask in the form ipaddr/maskaddr where ipaddr is the IP address and maskaddr is the netmask in dotted decimal notation for IPv4, or similar for IPv6, e.g. ffff:ffff:ffff:ffff:: instead of /64. All IP addresses which match the masked IP address will be allowed in.
  • a hostname pattern using wildcards. If the hostname of the connecting IP (as determined by a reverse lookup) matches the wildcarded name (using the same rules as normal unix filename matching), the client is allowed in. This only works if "reverse lookup" is enabled (the default).
  • a hostname. A plain hostname is matched against the reverse DNS of the connecting IP (if "reverse lookup" is enabled), and/or the IP of the given hostname is matched against the connecting IP (if "forward lookup" is enabled, as it is by default). Any match will be allowed in.
Note IPv6 link-local addresses can have a scope in the address specification:
fe80::1%link1
fe80::%link1/64
fe80::%link1/ffff:ffff:ffff:ffff::
You can also combine "hosts allow" with a separate "hosts deny" parameter. If both parameters are specified then the "hosts allow" parameter is checked first and a match results in the client being able to connect. The "hosts deny" parameter is then checked and a match means that the host is rejected. If the host does not match either the "hosts allow" or the "hosts deny" patterns then it is allowed to connect.
The default is no "hosts allow" parameter, which means all hosts can connect.
refuse options
This parameter allows you to specify a space-separated list of rsync command line options that will be refused by your rsync daemon. You may specify the full option name, its one-letter abbreviation, or a wild-card string that matches multiple options. For example, this would refuse --checksum (-c) and all the various delete options:
  refuse options = c delete
The reason the above refuses all delete options is that the options imply --delete, and implied options are refused just like explicit options. As an additional safety feature, the refusal of "delete" also refuses remove-source-files when the daemon is the sender; if you want the latter without the former, instead refuse "delete-" -- that refuses all the delete modes without affecting --remove-source-files*.
When an option is refused, the daemon prints an error message and exits. To prevent all compression when serving files, you can use "dont compress = *" (see below) instead of "refuse options = compress" to avoid returning an error to a client that requests compression.
max connections
This parameter allows you to specify the maximum number of simultaneous connections you will allow. Any clients connecting when the maximum has been reached will receive a message telling them to try later. The default is 0, which means no limit. A negative value disables the module. See also the "lock file" parameter.

Author

Eriks Ocakovskis C11, Estonian IT College, 07-05-2017

References

  1. 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 [3] Rsync official man page
  2. 2.0 2.1 2.2 2.3 2.4 [4] How Rsync Works A Practical Overview
  3. 3.0 3.1 [5] Tutorial on using Rsync
  4. 4.0 4.1 [6] Rsync daemon configuration file man page