How to make backups with Linux and Rsync?

1. Purpose

As a web host, we needed an automated mechanism for generating snapshot of server filesystems on the Linux based systems. There are a number of ways to achieve backups on Linux systems including Remote backup using Linux tar/ssh/cron and incremental tar backups on a local file system. One of the drawback of using tar to perform an entire filesystem backup is that some systems do not have an ability to create a compressed tarball greater than 2GB in size.

Rsync offers a reliable mechanism for synchronizing files and directories from one location to another while minimizing data transfer by only transferring deltas. Rsync is included in most Linux distributions, and installation is very easy. Properly configured rsync that performs system backups can protect against hard disk failures and system compromises.

2. What is Rsync?

Rsync is a little Linux utility that synchronizes filesystems from one place to another by only copying diffs (deltas) of files that have changed. Rsync optionally compresses the files ont-the-fly before transfer (to save transfer time) and may be used in conjunction with rsh or ssh to perform remote file transfers. Rsync may be used as a backup or mirroring utility.

The advantage of using rsync over other archive and copy utilities such as tar, dump and rcp are that rsync (1) can use ssh as a secure channel to transfer files over the network, (2) provides an ability to retain ownership and permission of files being transferred, (3) enables files and directories to be synchronized (deleted files are deleted from the last replication), and (4) transfers only "delta" files that are changed from last replication making transfer much faster. If Rysnc is used without the ssh, it uses the TCP port 873.

3. How does Rsync work?

Rsync can be used in standalone or a client/server mode, with client/server mode a little more common.

In a standalone mode, you may use rsync to copy files and directories by running the rsync command on the command-line. This is useful when replicating files and directories on a same machine, or replicating between two machines using rsh/ssh channel. By using ssh, you're using TCP port 22 instead of TCP port 873 (rsync). To use ssh without supplying a password (in automated backup), you're required to setup a trusted environment between the two machines by generating private/public pair of keys and installing them on the machines. Instructions on setting up the private/public key pairs is described in Setting up trusted ssh environment with public/private key pair article.

In a client/server mode, one machine becomes a "Rsync Server" by running the rsync in a daemon mode, and one or more client machine(s) may then synchronize the files to and from the server. Setting up a rsync server requires customizing a rsync configuration file, which resides in /etc/rsyncd.conf (or similar location). Running rsync in client/server mode does not require rsh/ssh transport channel, and hence uses the TCP port 873 designated for rsync protocol.

4. Running the Rsync in a standalone mode?

If you intend to replicate a filesystem on a local machine or use rsh/ssh as the secure channel to transfer files from one machine to another, you can use Rsync in standalone mode.

To copy files from one directory structure to another, you may simply run rsync command. The -a switch retains owner and permission information of the files being copied. This must be executed by 'root' user in order to change user and permission data.

bash# rsync -a source destination

The command above is similar to "cp -r from to/, where {to} directory must already exists. Similarly, replicating filesystem from one machine to another may be done by running:

bash# rsync -a -e ssh source username@remote_host:/path/to/destination

It should be noted that rsync does care about trailing slash in the source argument. If trailing slash ("/") is supplied in the source argument, the contents of the directory is copied whereas if no trailing slash ("/") is supplied, the entire directory is copied. The trailing slash in the destination has no significance as it is always expected as a directory.

For example, "rsync -a a b" copies directory a inside the b and hence the files are copied to the b/a/ directory. If, however, "rsync -a a/ b" is used, the files are stored in b/ directory without the directory a.

5. Running the Rsync in a client/server mode?

To use rsync in client/server mode, we must setup a Rsync Server. Setting a rsync server involves two steps (A) customizing /etc/rsyncd.conf configuration file, and (B) running the rsync command in daemon mode.

A. Configuring /etc/rsyncd.conf configuration file.

The Rsync configuration file looks very similar to Samba configuration file as the rsync is co-authored by Andrew Tridgell, an author of Simba. The detailed description of rsyncd.conf can be found in Linux manpage. A example of rsync configuration file may look something like this:

motd file = /etc/rsyncd.motd
log file = /var/log/rsyncd.log
secrets file = /etc/rsyncd.screts

[target]
   path = /home
   comment = User home directories
   uid = nobody
   gid = nobody
   auth users = scott, michael
   host allow = 192.168.0.0/24
   host deny = *
   list = false

Important: It should be noted that Rsync will NOT grant access to a protected share if the secret (password) file noted above (/etc/rsyncd.secrets) is world-readable.

In the configuration settings worth noting above include "target", a name used to refer a particular rsync target. In a target block, a number of configuration options may be defined. The "path" option specifies the files/directories to be rsync'ed, and "auth users" restricts access to a pre-defined users that are specified in the secrets file. The "uid" and "gid" are user/group pair that will be running the rsync backup. "auth users" need not be system users. "host allow" and "host deny" restricts hosts that can transfer file to/from the server. It is stronly advised that "host allow" and "host deny" options be setup as without those options, the target is world-readable.

We need to create a secrets file, /etc/rsyncd.secrets, with the contents:

scott:helloworld

The secrets file above contains a user, "scott", with a password "helloworld". Since the password is stored in plain text, the file must be owned by the root, and readable only by the root (permission 400 or 600). Otherwise, the rsync will simply not start at all.

B. Running rsync daemon

You may launch rsync daemon in one of two methods: via the xinetd or as a standalone. When ran from the inetd, the following two files need to be edited.

bash# nano /etc/services
...
rsync 873/tcp
...

bash# nano /etc/xinetd.d/rsync
service rsync
{
        disable = no
        socket_type     = stream
        wait            = no
        user            = root
        server          = /usr/bin/rsync
        server_args     = --daemon
       log_on_failure  += USERID
}

bash# service xinetd restart

The example above allows rsync to be run via the xinetd daemon. To restart rsync daemon, you may restart the xinet daemon.

Alternatively, you may run rsync in a daemon mode from the command-line.

bash# rsync --daemon

Once we have rsync server setup, we can run rsync client from a client machine. To run a particular target defined in the /etc/rsyncd.conf configuration file, you will run rsync in the following manner:

bash# rsync -a scott@rsync_server::target /opt/rsync/backup

Password: ******

Notice that we do NOT specify a source path in the command above, but instead a target name ("target") is specified after :: separator. The rsync configuration file describes the target with access control in detail. Enter the password defined in the secrets file.

FAQ: How to you bypass password prompt?

If you wish to automate rsync with cron, you must bypass password prompt. If you're running rsync on TCP port 873, you may use RSYNC_PASSWORD environment variable. Just write a simple bash script that sets the RSYNC_PASSWORD variable just before invoking the rsync command as shown below. When you're supplying clear-text password in a file, it's important to protect your file with a permission mode of (chmod 600) so that no one except for you (an root) can see it.

#!/bin/bash
RSYNC="/usr/bin/rsync -a --delete"
export RSYNC_PASSWORD=helloworld

$RSYNC scott@192.168.0.2::target /path/to/local/filesystem

If you're using ssh channel, you'll have to setup a trusted environment with public/private key pair. To learn how to setup trusted ssh environment, please review Setting up trusted ssh environment.

6. Some useful command-line options

--delete When rsync is used to replicate one filesystem to another, the --delete option can be used to delete the file in destination filesystem if source filesystem file is deleted. Otherwise, the deleted file will continue to reside in the destination filesystem. The default behavior of rsync keeps the deleted copy in the destination filesystem. Some of the rsync examples can be found in http://rsync.samba.org/examples.html.

Tags: 

Comments

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.