Backup using Rsync

Introduction to rsync

Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use.

Rsync's main advantage over other backup/copying methods is it's delta-transfer algorithm. It ensures that only the modified parts of certain files are copied, thus reducing bandwidth use and the time required to complete the transfer operation.

Preparation

For the purpose of this guide, we will assume that you intend to copy the remote contents of the seafile_data folder to a local drive, as well as the database dump.

The remote seafile_data directory will be located at /srv/seafile/haiwen/seafile-data . The local drive will be mounted at /run/media/foo/seafbackup/.

We will copy the database dumps from the server to our local directory /run/media/foo/seafbackup/seafile_databases using a tool of your choosing, for example SFTP. The remote seafile_data folder will be copied to /run/media/foo/seafbackup/seafile_data_backup/ using rsync.

Before you begin the next steps, Install the rsync package via your distribution's package manager.

Backup steps

1. Stop Seafile and Seahub

As a systemd service:

systemctl stop seafile && systemctl stop seahub

Or using only the script:

./seahub.sh stop && ./seafile.sh stop

2. Backup the databases

SSH into the seafile/database host and backup the databases. You will be asked for the MySQL root password:

mysqldump -u root -p --opt ccnet-db > ccnet-db.sql.`date +"%Y-%m-%d-%H-%M-%S"` &&
mysqldump -u root -p --opt seafile-db > seafile-db.sql.`date +"%Y-%m-%d-%H-%M-%S"` &&
mysqldump -u root -p --opt seahub-db > seahub-db.sql.`date +"%Y-%m-%d-%H-%M-%S"`

Use SFTP to copy them over to a location of your choosing, for example to /run/media/foo/seafbackup/seafile_databases/ on our local drive.

3. Seafile GC

In order to avoid backing up deleted libraries or data, we should run the seafile garbage collection first.

runuser -l seafile -c 'cd /srv/seafile/haiwen/seafile-server-latest && ./seaf-gc.sh

4. Start Seafile and Seahub

At this point, you can start back seafile and seahub:

systemctl start seahub && systemctl start seafile

5. Rsync data backup

It's time to start the actual data synchronization. Go to the directory where you would like the data to be copied to. Create the directory if it doesn't exist.

mkdir /run/media/foo/seafbackup/seafile_data_backup/
cd /run/media/foo/seafbackup/seafile_data_backup/

The contents of the remote seafile_data folder will be synced here.

If you are using a private key to connect to the server over SSH or if you're using a non-standard port, define the following variables:

ssh_key='/your/private/key/location/id_rsa'
remote_port='xxxx'

Replace the strings with your port/private key location.

Now, replace the user/server domain and issue the rsync command for a dry run (nothing will be copied yet):

rsync -avzP --dry-run --delete --human-readable --stats -e "ssh -p $remote_port -i $ssh_key" root@example.com:/srv/seafile/haiwen/seafile-data/ ./

NOTE: Preserving the .../seafile-data/ trailing slash means that we wish to copy only the directory contents, not the directory itself.

The arguments are as follows:

 -a, --archive      archive mode (preserves permissions)
 -v, --verbose      increase verbosity
 -z, --compress     compress file data during the transfer
 -P                 show progress
 -n, --dry-run      doesn't make any changes
 --delete           delete extraneous local files
 --human-readable   output in a human-readable format
 --stats            give some file-transfer stats
 -e                 alternative remote shell

The output should look something like this:

Number of files: 89,298 (reg: 80,773, dir: 8,525)
Number of created files: 4,803 (reg: 4,345, dir: 458)
Number of deleted files: 0
Number of regular files transferred: 4,349
Total file size: 86.94G bytes
Total transferred file size: 23.91G bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 1.66M
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 26.16K
Total bytes received: 4.57M

sent 26.16K bytes  received 4.57M bytes  612.17K bytes/sec
total size is 86.94G  speedup is 18,934.93 (DRY RUN)

The Total file size: 86.94G bytes and Total transferred file size: 23.91G bytes are interesting to look at. The first tells us that the remote directory is 86.96 GB in size, but the second tell us that only 23.91 GB will be copied to our local drive. Herein lies the advantage of rsync: because in our example the files were already synchronized beforehand, this new sync will only pull the modified data, not all of it. If you run the sync for the first time, however, you will have to download all the files with their full size.

When you're ready to begin synching, issue the command without the --dry-run argument.

rsync -avzP --delete --human-readable --stats -e  "ssh -p $remote_port -i $ssh_key" root@example.com:/srv/seafile/haiwen/seafile-data/ ./

Conclusion

If everything went according to plan, we have retrieved a copy of the seafile, seahub and ccnet databases and placed them in /run/media/foo/seafbackup/seafile_databases .

The contents of the seafile_data folder are now synchronized to /run/media/foo/seafbackup/seafile_databases/ .

As the sync will only pull the modified files from the remote host, the more often backups are run, the less data will need to be copied. A possible solution for automating the backup is using CRON to run a script with all the above-mentioned commands.

Backing up to a remote host with an unprivileged user and automatic snapshots significantly increases security of the backups. An attacker cannot delete the backup without intruding into the backup host as well, crypto trojans cannot encrypt the backup, some history can be kept to be be prepared for issues that are found late and depending on the target location there is great protection against lightning strikes, other bad weather conditions, fire and theft.