www.nico.schottelius.org/software/ccollect/ccollect-0.6/doc/ccollect.text

970 lines
40 KiB
Text
Raw Normal View History

ccollect - Installing, Configuring and Using
============================================
Nico Schottelius <nico-ccollect__@__schottelius.org>
0.6, for ccollect 0.6, Initial Version from 2006-01-13
:Author Initials: NS
(pseudo) incremental backup
with different exclude lists
using hardlinks and `rsync`
Introduction
------------
`ccollect` is a backup utility written in the sh-scripting language.
It does not depend on a specific shell, only `/bin/sh` needs to be
bourne shell compatible (like 'dash', 'ksh', 'zsh', 'bash', ...).
Supported and tested operating systems and architectures
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`ccollect` was successfully tested on the following platforms:
- GNU/Linux on amd64/hppa/i386
- NetBSD on amd64/i386/sparc/sparc64
- OpenBSD on amd64
It *should* run on any Unix that supports `rsync` and has a POSIX-compatible
bourne shell. If your platform is not listed above and you have it successfully
running, please drop me a mail.
Why you can only backup from remote hosts, not to them
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
While considering the design of ccollect, I thought about enabling
backup to *remote* hosts. Though this sounds like a nice feature
('Backup my notebook to the server now.'), in my opinion it is a
bad idea to backup to a remote host.
Reason
^^^^^^
If you want to backup *TO* a remote host, you have to loosen security on it.
Imagine the following situation: You backup your farm of webservers *TO*
a backup host somewhere else.
Now one of your webservers which has access to your backup host gets
compromised.
Your backup server will be compromised, too.
And the attacker will have access to all data on the other webservers.
Doing it securely
^^^^^^^^^^^^^^^^^
Think of it the other way round: The backup server (now behind a
firewall using NAT and strong firewall rules) connects to the
webservers and pulls the data *from* them. If someone gets access to one
of the webservers, this person will perhaps not even see your machine. If
the attacker does see connections from a host to the compromised
machine, he/she will not be able to log in on the backup machine.
All other backups are still secure.
Incompatibilities
~~~~~~~~~~~~~~~~~
Versions 0.5 and 0.6
^^^^^^^^^^^^^^^^^^^^^
.The format of `rsync_options` changed:
- Before 0.6 it was whitespace delimeted
- As of 0.6 it is newline seperated (so you can pass whitespaces to `rsync`)
You can update your configuration using `tools/config-pre-0.6-to-0.6.sub.sh`.
.The name of the backup directories changed:
- Before 0.6: "date +%Y-%m-%d-%H%M"
- As of 0.6: "date +%Y%m%d-%H%M" (better readable, date is closer together)
For the second change there is no updated needed, as XXXX- is always before
XXXXX (- comes before digit).
Versions 0.4 and 0.5
^^^^^^^^^^^^^^^^^^^^^
Not a real incompatibilty, but seems to fit in this section:
.0.5 does *NOT* require
- PaX
- bc
anymore!
Versions < 0.4 and 0.4
^^^^^^^^^^^^^^^^^^^^^^
Since `ccollect` 0.4 there are several incompatibilities with earlier
versions:
.List of incompatibilities
- `pax` (Posix) is now required, `cp -al` (GNU specific) is removed
- "interval" was written with two 'l' (ell), which is wrong in English
- Changed the name of backup directories, removed the colon in the interval
- ccollect will now exit when preexec returns non-zero
- ccollect now reports when postexec returns non-zero
You can convert your old configuration directory using
`config-pre-0.4-to-0.4.sh`, which can be found in the *tools/*
subdirectory:
--------------------------------------------------------------------------------
[10:05] hydrogenium:ccollect-0.4# ./tools/config-pre-0.4-to-0.4.sh /etc/ccollect
--------------------------------------------------------------------------------
Quick start
-----------
For those who do not want to read the whole long document:
--------------------------------------------------------------------------------
# get latest ccollect tarball from http://unix.schottelius.org/ccollect/
# replace value for CCV with the current version
export CCV=0.6
#
# replace 'wget' with fetch on bsd
#
holen=wget
"$holen" http://unix.schottelius.org/ccollect/ccollect-${CCV}.tar.bz2
# extract the tarball, change to the newly created directory
tar -xvjf ccollect-${CCV}.tar.bz2
cd ccollect-${CCV}
# create mini-configuration
# first create directory structure
mkdir -p miniconfig/defaults/intervals
mkdir miniconfig/sources
# create sample intervals
echo 2 > miniconfig/defaults/intervals/testinterval
echo 3 > miniconfig/defaults/intervals/testinterval2
# create destination directory, where the backups will be kept
mkdir ~/DASI
# create sample source, which will be saved
mkdir miniconfig/sources/testsource
# We will save '/bin' to the directory '~/DASI'
echo '/bin' > miniconfig/sources/testsource/source
ln -s ~/DASI miniconfig/sources/testsource/destination
# We want to see what happens and also a small summary at the end
touch miniconfig/sources/testsource/verbose
touch miniconfig/sources/testsource/summary
echo "do the backup, twice"
CCOLLECT_CONF=./miniconfig ./ccollect.sh testinterval testsource
CCOLLECT_CONF=./miniconfig ./ccollect.sh testinterval testsource
echo "the third time ccollect begins to remove old backups"
echo -n "Hit enter to see it"
read
CCOLLECT_CONF=./miniconfig ./ccollect.sh testinterval testsource
echo "Now we add another interval, ccollect should clone from existent ones"
echo -n "Hit enter to see it"
read
CCOLLECT_CONF=./miniconfig ./ccollect.sh testinterval2 testsource
echo "Let's see how much space we used with two backups and compare it to /bin"
du -s ~/DASI /bin
--------------------------------------------------------------------------------
Cutting and pasting the complete section above to your shell will result in
the download of ccollect, the creation of a sample configuration and the
execution of some backups.
Requirements
------------
Installing ccollect
~~~~~~~~~~~~~~~~~~~
For the installation you need at least
- the latest ccollect package (http://unix.schottelius.org/ccollect/)
- either `cp` and `chmod` or `install`
- for more comfort: `make`
- for rebuilding the generated documentation: additionally `asciidoc`
Using ccollect
~~~~~~~~~~~~~~
.Running ccollect requires the following tools to be installed:
- `date`
- `rsync`
- `ssh` (if you want to use rsync over ssh, which is recommened for security)
Installing
----------
Either type 'make install' or simply copy it to a directory in your
$PATH and execute 'chmod *0755* /path/to/ccollect.sh'. If you would
like to use the new management scripts (available since 0.6), copy
the following scripts to a directory in $PATH:
- `tools/add_ccollect_source.sh`
- `tools/list_ccollect_intervals.sh`
- `tools/delete_ccollect_source.sh`
Configuring
-----------
Runtime options
~~~~~~~~~~~~~~~
`ccollect` looks for its configuration in '/etc/ccollect' or, if set, in
the directory specified by the variable '$CCOLLECT_CONF':
--------------------------------------------------------------------------------
# sh-compatible (zsh, mksh, ksh, bash, ...)
$ CCOLLECT_CONF=/your/config/dir ccollect.sh ...
# csh
$ ( setenv CCOLLECT_CONF /your/config/dir ; ccollect.sh ... )
--------------------------------------------------------------------------------
When you start `ccollect`, you have to specify in which interval
to backup (daily, weekly, yearly; you can specify the names yourself, see below) and which sources to backup (or -a to backup all sources).
The interval specifies how many backups are kept.
There are also some self-explanatory parameters you can pass to ccollect, simply use
`ccollect.sh --help` for info.
General configuration
~~~~~~~~~~~~~~~~~~~~~
The general configuration can be found in $CCOLLECT_CONF/defaults or
/etc/ccollect/defaults. All options specified there are generally valid for
all source definitions, although the values can be overwritten in the source
configuration.
All configuration entries are plain-text files (use UTF-8 for non-ascii characters).
Interval definition
^^^^^^^^^^^^^^^^^^^^
The interval definition can be found in
'$CCOLLECT_CONF/defaults/intervals/' or '/etc/ccollect/defaults/intervals'.
Each file in this directory specifies an interval. The name of the file is
the same as the name of the interval: `intervals/'<interval name>'`.
The content of this file should be a single line containing a number.
This number defines how many versions of this interval are kept.
Example:
-------------------------------------------------------------------------
[10:23] zaphodbeeblebrox:ccollect-0.2% ls -l conf/defaults/intervals/
insgesamt 12
-rw-r--r-- 1 nico users 3 2005-12-08 10:24 daily
-rw-r--r-- 1 nico users 3 2005-12-08 11:36 monthly
-rw-r--r-- 1 nico users 2 2005-12-08 11:36 weekly
[10:23] zaphodbeeblebrox:ccollect-0.2% cat conf/defaults/intervals/*
28
12
4
--------------------------------------------------------------------------------
This means to keep 28 daily backups, 12 monthly backups and 4 weekly.
General pre- and post-execution
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you add '$CCOLLECT_CONF/defaults/`pre_exec`' or
'/etc/ccollect/defaults/`pre_exec`' (same with `post_exec`), `ccollect`
will start `pre_exec` before the whole backup process and
`post_exec` after backup of all sources is done.
The following example describes how to report free disk space in
human readable format before and after the whole backup process:
-------------------------------------------------------------------------
[13:00] hydrogenium:~# mkdir -p /etc/ccollect/defaults/
[13:00] hydrogenium:~# echo '#!/bin/sh' > /etc/ccollect/defaults/pre_exec
[13:01] hydrogenium:~# echo '' >> /etc/ccollect/defaults/pre_exec
[13:01] hydrogenium:~# echo 'df -h' >> /etc/ccollect/defaults/pre_exec
[13:01] hydrogenium:~# chmod 0755 /etc/ccollect/defaults/pre_exec
[13:01] hydrogenium:~# ln -s /etc/ccollect/defaults/pre_exec /etc/ccollect/defaults/post_exec
-------------------------------------------------------------------------
Source configuration
~~~~~~~~~~~~~~~~~~~~
Each source configuration exists in '$CCOLLECT_CONF/sources/$name' or
'/etc/ccollect/sources/$name'.
The name you choose for the subdirectory describes the source.
Each source contains at least the following files:
- `source` (a text file containing the `rsync` compatible path to backup)
- `destination` (a link to the directory we should backup to)
Additionally a source may have the following files:
- `verbose` whether to be verbose (passes -v to `rsync`)
- `very_verbose` be very verbose (`mkdir -v`, `rm -v` and `rsync -vv`)
- `summary` create a transfer summary when `rsync` finished
- `exclude` exclude list for `rsync`. newline seperated list.
- `rsync_options` extra options for `rsync`. newline seperated list.
- `pre_exec` program to execute before backing up *this* source
- `post_exec` program to execute after backing up *this* source
- `delete_incomplete` delete incomplete backups
Example:
--------------------------------------------------------------------------------
[10:47] zaphodbeeblebrox:ccollect-0.2% ls -l conf/sources/testsource2
insgesamt 12
lrwxrwxrwx 1 nico users 20 2005-11-17 16:44 destination -> /home/nico/backupdir
-rw-r--r-- 1 nico users 62 2005-12-07 17:43 exclude
drwxr-xr-x 2 nico users 4096 2005-12-07 17:38 intervals
-rw-r--r-- 1 nico users 15 2005-11-17 16:44 source
[10:47] zaphodbeeblebrox:ccollect-0.2% cat conf/sources/testsource2/exclude
openvpn-2.0.1.tar.gz
nicht_reinnehmen
etwas mit leerzeichenli
[10:47] zaphodbeeblebrox:ccollect-0.2% ls -l conf/sources/testsource2/intervals
insgesamt 4
-rw-r--r-- 1 nico users 2 2005-12-07 17:38 daily
[10:48] zaphodbeeblebrox:ccollect-0.2% cat conf/sources/testsource2/intervals/daily
5
[10:48] zaphodbeeblebrox:ccollect-0.2% cat conf/sources/testsource2/source
/home/nico/vpn
--------------------------------------------------------------------------------
Detailed description of "source"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
`source` describes a `rsync` compatible source (one line only).
For instance 'backup_user@foreign_host:/home/server/video'.
To use the `rsync` protocol without the `ssh`-tunnel, use
'rsync::USER@HOST/SRC'. For more information have a look at the manpage
of `rsync`(1).
Detailed description of "verbose"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
`verbose` tells `ccollect` that the log should contain verbose messages.
If this file exists in the source specification *-v* will be passed to `rsync`.
`
Example:
--------------------------------------------------------------------------------
[11:35] zaphodbeeblebrox:ccollect-0.2% touch conf/sources/testsource1/verbose
--------------------------------------------------------------------------------
Detailed description of "very_verbose"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
`very_verbose` tells `ccollect` that it should log very verbosely.
If this file exists in the source specification *-v* will be passed to
`rsync`, `rm` and `mkdir`.
Example:
--------------------------------------------------------------------------------
[23:67] nohost:~% touch conf/sources/testsource1/very_verbose
--------------------------------------------------------------------------------
Detailed description of "summary"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you create the file `summary` in the source definition,
`ccollect` will present you a nice summary at the end.
-------------------------------------------------------------------------------
backup:~# touch /etc/ccollect/sources/root/summary
backup:~# ccollect.sh werktags root
==> ccollect.sh: Beginning backup using interval werktags <==
[root] Beginning to backup this source ...
[root] Currently 3 backup(s) exist, total keeping 50 backup(s).
[root] Beginning to backup, this may take some time...
[root] Hard linking...
[root] Transferring files...
[root]
[root] Number of files: 84183
[root] Number of files transferred: 32
[root] Total file size: 26234080536 bytes
[root] Total transferred file size: 9988252 bytes
[root] Literal data: 9988252 bytes
[root] Matched data: 0 bytes
[root] File list size: 3016771
[root] File list generation time: 1.786 seconds
[root] File list transfer time: 0.000 seconds
[root] Total bytes sent: 13009119
[root] Total bytes received: 2152
[root]
[root] sent 13009119 bytes received 2152 bytes 2891393.56 bytes/sec
[root] total size is 26234080536 speedup is 2016.26
[root] Successfully finished backup.
==> Finished ccollect.sh <==
-------------------------------------------------------------------------------
You could also combine it with `verbose` or `very_verbose`, but these
already print some statistics (though not all / the same as presented by
`summary`).
Detailed description of "exclude"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
`exclude` specifies a list of paths to exclude. The entries are seperated by a newline (\n).
Example:
--------------------------------------------------------------------------------
[11:35] zaphodbeeblebrox:ccollect-0.2% cat conf/sources/testsource2/exclude
openvpn-2.0.1.tar.gz
nicht_reinnehmen
etwas mit leerzeichenli
something with spaces is not a problem
--------------------------------------------------------------------------------
Detailed description of "destination"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
`destination` must be a link to the destination directory.
Example:
--------------------------------------------------------------------------------
[11:36] zaphodbeeblebrox:ccollect-0.2% ls -l conf/sources/testsource2/destination
lrwxrwxrwx 1 nico users 20 2005-11-17 16:44 conf/sources/testsource2/destination -> /home/nico/backupdir
--------------------------------------------------------------------------------
To tell the truth, this is not fully correct. `ccollect` will also backup
your data if `destination` is a directory. But do you really want to have
a backup in /etc?
Detailed description of "intervals/"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When you create the subdirectory `intervals/` in your source configuration
directory, you can specify individiual intervals for this specific source.
Each file in this directory describes an interval.
Example:
--------------------------------------------------------------------------------
[11:37] zaphodbeeblebrox:ccollect-0.2% ls -l conf/sources/testsource2/intervals/
insgesamt 8
-rw-r--r-- 1 nico users 2 2005-12-07 17:38 daily
-rw-r--r-- 1 nico users 3 2005-12-14 11:33 yearly
[11:37] zaphodbeeblebrox:ccollect-0.2% cat conf/sources/testsource2/intervals/*
5
20
--------------------------------------------------------------------------------
Detailled description of "rsync_options"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When you create the file `rsync_options` in your source configuration,
all the parameters in this file will be passed to rsync. This
way you can pass additional options to rsync. For instance you can tell rsync
to show progress ("--progress"), or which -password-file ("--password-file")
to use for automatic backup over the rsync-protocol.
Example:
--------------------------------------------------------------------------------
[23:42] hydrogenium:ccollect-0.2% cat conf/sources/test_rsync/rsync_options
--password-file=/home/user/backup/protected_password_file
--------------------------------------------------------------------------------
Detailled description of "pre_exec" and "post_exec"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When you create `pre_exec` and / or `post_exec` in your source
configuration, `ccollect` will execute this command before and
respectively after doing the backup for *this specific* source.
If you want to have pre-/post-exec before and after *all*
backups, see above for general configuration.
Example:
--------------------------------------------------------------------------------
[13:09] hydrogenium:ccollect-0.3% cat conf/sources/with_exec/pre_exec
#!/bin/sh
# Show whats free before
df -h
[13:09] hydrogenium:ccollect-0.3% cat conf/sources/with_exec/post_exec
#!/bin/sh
# Show whats free after
df -h
--------------------------------------------------------------------------------
Detailed description of "delete_incomplete"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you create the file `delete_incomplete` in a source specification directory,
`ccollect` will look for incomplete backups (when the whole `ccollect` process
was interrupted) and remove them. Without this file `ccollect` will only warn
the user.
Hints
-----
Using a different ssh port
~~~~~~~~~~~~~~~~~~~~~~~~~~
Mostly easy is to use your ~/.ssh/config file:
--------------------------------------------------------------------------------
host mx2.schottelius.org
Port 2342
--------------------------------------------------------------------------------
If you only use that port for backup and normally want to use another port,
you can add 'HostName' and "HostKeyAlias" (if you also have different
keys on the different ports):
--------------------------------------------------------------------------------
Host hhydrogenium
Hostname bruehe.schottelius.org
Port 666
HostKeyAlias hydrogenium
Host bruehe
Hostname bruehe.schottelius.org
Port 22
HostKeyAlias bruehe.schottelius.org
--------------------------------------------------------------------------------
Using source names or interval in pre_/post_exec scripts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The pre-/post_exec scripts can access some internal variables from `ccollect`:
- INTERVAL: The interval specified on the command line
- no_sources: number of sources
- source_$NUM: the name of the source
- name: the name of the currently being backuped source (not available for
generic pre_exec script)
Using rsync protocol without ssh
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When you have a computer with little computing power, it may be useful to use
rsync without ssh, directly using the rsync protocol
(specify 'user@host::share' in `source`). You may wish to use
`rsync_options` to specify a password file to use for automatic backup.
Example:
--------------------------------------------------------------------------------
backup:~# cat /etc/ccollect/sources/sample.backup.host.org/source
backup@webserver::backup-share
backup:~# cat /etc/ccollect/sources/sample.backup.host.org/rsync_options
--password-file=/etc/ccollect/sources/sample.backup.host.org/rsync_password
backup:~# cat /etc/ccollect/sources/sample.backup.host.org/rsync_password
this_is_the_rsync_password
--------------------------------------------------------------------------------
This hint was reported by Daniel Aubry.
Not excluding top-level directories
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When you exclude "/proc" or "/mnt" from your backup, you may run into
trouble when you restore your backup. When you use "/proc/\*" or "/mnt/\*"
instead, `ccollect` will backup empty directories.
[NOTE]
===========================================
When those directories contain hidden files
(those beginning with a dot (*.*)),
they will still be transferred!
===========================================
This hint was reported by Marcus Wagner.
Re-using already created rsync-backups
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you used `rsync` directly before you use `ccollect`, you can
use this old backup as initial backup for `ccollect`: You
simply move it into a subdirectory named "'interval'.0".
Example:
-------------------------------------------------------------------------------
backup:/home/backup/web1# ls
bin dev etc initrd lost+found mnt root srv usr vmlinuz
boot doc home lib media opt sbin tmp var vmlinuz.old
backup:/home/backup/web1# mkdir daily.0
# ignore error about copying to itself
backup:/home/backup/web1# mv * daily.0 2>/dev/null
backup:/home/backup/web1# ls
daily.0
-------------------------------------------------------------------------------
Now you can use /home/backup/web1 as the `destination` for the backup.
[NOTE]
===============================================================================
Do *not* name the first backup something like "daily.initial", but use
the "*0*" (or some number that is lower than the current year)
as extension. `ccollect` uses `sort` to find the latest backup. `ccollect`
itself uses 'interval.YEARMONTHDAY-HOURMINUTE.PID'. This notation will
*always* be before "daily.initial", as numbers are earlier in the list
which is produced by `sort`. So, if you have a directory named "daily.initial",
`ccollect` will always diff against this backup and transfer and delete
files which where deleted in previous backups. This means you simply
waste resources, but your backup will be complete.
===============================================================================
Using pre_/post_exec
~~~~~~~~~~~~~~~~~~~~
Your pre_/post_exec script does not need to be a script, you can also
use a link to
- an existing program
- an already written script
The only requirement is that it is executable.
Using source specific interval definitions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When you are backing up multiple hosts via cron each night, it may be
a problem that host "big_server" may only have 4 daily backups, because
otherwise its backup device will be full. But for all other hosts
you want to keep 20 daily backups. In this case you would create
`/etc/ccollect/default/intervals/daily` containing "20" and
`/etc/ccollect/sources/big_server/intervals/daily` containing "4".
Source specific intervals always overwrite the default values.
If you have to specify it individually for every host, because
of different requirements, you can even omit creating
`/etc/ccollect/default/intervals/daily`.
F.A.Q.
------
What happens if one backup is broken or empty?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let us assume that one backup failed (connection broke or the source
hard disk had some failures). Therefore we've got one incomplete backup in our history.
`ccollect` will transfer the missing files the next time you use it.
This leads to
- more transferred files
- much greater disk space usage, as no hardlinks can be used
If the whole `ccollect` process was interrupted, `ccollect` (since 0.6) can
detect that and remove the incomplete backups, so you can clone from a complete
backup instead.
When backing up from localhost the destination is also included. Is this a bug?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
No. `ccollect` passes your source definition directly to `rsync`. It
does not try to analyze it. So it actually does not know if a source
comes from local harddisk or from a remote server. And it does not want
to. When you backup from the local harddisk (which is perhaps not
even a good idea when thinking of security), add the `destination`
to 'source/exclude'. (Daniel Aubry reported this problem)
Why does ccollect say "Permission denied" with my pre-/postexec script?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The most common error is that you have not given your script the correct
permissions. Try `chmod 0755 /etc/ccollect/sources/'yoursource'/*_exec``.
Why does the backup job fail when part of the source is a link?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When a part of your path you specified in the source is a
(symbolic, hard links are not possible for directories) link,
the backup *must* fail.
First of all, let us have a look at how it looks like:
-------------------------------------------------------------------------------
==> ccollect 0.4: Beginning backup using interval taeglich <==
[testsource] Sa Apr 29 00:01:55 CEST 2006 Beginning to backup
[testsource] Currently 0 backup(s) exist(s), total keeping 10 backup(s).
[testsource] Beginning to backup, this may take some time...
[testsource] Creating /etc/ccollect/sources/testsource/destination/taeglich.2006-04-29-0001.3874 ...
[testsource] Sa Apr 29 00:01:55 CEST 2006 Transferring files...
[testsource] rsync: recv_generator: mkdir "/etc/ccollect/sources/testsource/destination/taeglich.2006-04-29-0001.3874/home/user/nico/projekte/ccollect" failed: No such file or directory (2)
[testsource] rsync: stat "/etc/ccollect/sources/testsource/destination/taeglich.2006-04-29-0001.3874/home/user/nico/projekte/ccollect" failed: No such file or directory (2)
[...]
-------------------------------------------------------------------------------
So what is the problem? It is very obvious when you look deeper into it:
-------------------------------------------------------------------------------
% cat /etc/ccollect/sources/testsource/source
/home/user/nico/projekte/ccollect/ccollect-0.4
% ls -l /home/user/nico/projekte
lrwxrwxrwx 1 nico nico 29 2005-12-02 23:28 /home/user/nico/projekte -> oeffentlich/computer/projekte
% ls -l /etc/ccollect/sources/testsource/destination/taeglich.2006-04-29-0001.3874/home/user/nico
lrwxrwxrwx 1 nico nico 29 2006-04-29 00:01 projekte -> oeffentlich/computer/projekte
-------------------------------------------------------------------------------
`rsync` creates the directory structure before it creates the symbolic link.
This link now links to something not reachable (dead link). It is
impossible to create subdirectories under the broken link.
In conclusion you cannot use paths with a linked part.
However, you can backup directories containing symbolic links
(in this case you could backup /home/user/nico, which contains
/home/user/nico/projekte and oeffentlich/computer/projekte).
How can I prevent missing the right time to enter my password?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As `ccollect` first deletes the old backups, it may take some time
until `rsync` requests the password for the `ssh` session from you.
The easiest way not to miss that point is running `ccollect` in `screen`,
which has the ability to monitor the output for activity. So as soon as
your screen beeps, after `ccollect` began to remove the last directory,
you can enter your password (have a look at screen(1), especially "C-a M"
and "C-a _", for more information).
Examples
--------
A backup host configuration from scratch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--------------------------------------------------------------------------------
srwali01:~# mkdir /etc/ccollect
srwali01:~# mkdir -p /etc/ccollect/defaults/intervals/
srwali01:~# echo 28 > /etc/ccollect/defaults/intervals/taeglich
srwali01:~# echo 52 > /etc/ccollect/defaults/intervals/woechentlich
srwali01:~# cd /etc/ccollect/
srwali01:/etc/ccollect# mkdir sources
srwali01:/etc/ccollect# cd sources/
srwali01:/etc/ccollect/sources# ls
srwali01:/etc/ccollect/sources# mkdir local-root
srwali01:/etc/ccollect/sources# cd local-root/
srwali01:/etc/ccollect/sources/local-root# echo / > source
srwali01:/etc/ccollect/sources/local-root# cat > exclude << EOF
> /proc
> /sys
> /mnt
> EOF
srwali01:/etc/ccollect/sources/local-root# ln -s /mnt/hdbackup/local-root destination
srwali01:/etc/ccollect/sources/local-root# mkdir /mnt/hdbackup/local-root
srwali01:/etc/ccollect/sources/local-root# ccollect.sh taeglich local-root
/o> ccollect.sh: Beginning backup using interval taeglich
/=> Beginning to backup "local-root" ...
|-> 0 backup(s) already exist, keeping 28 backup(s).
--------------------------------------------------------------------------------
After that, I added some more sources:
--------------------------------------------------------------------------------
srwali01:~# cd /etc/ccollect/sources
srwali01:/etc/ccollect/sources# mkdir windos-wl6
srwali01:/etc/ccollect/sources# cd windos-wl6/
srwali01:/etc/ccollect/sources/windos-wl6# echo /mnt/win/SYS/WL6 > source
srwali01:/etc/ccollect/sources/windos-wl6# ln -s /mnt/hdbackup/wl6 destination
srwali01:/etc/ccollect/sources/windos-wl6# mkdir /mnt/hdbackup/wl6
srwali01:/etc/ccollect/sources/windos-wl6# cd ..
srwali01:/etc/ccollect/sources# mkdir windos-daten
srwali01:/etc/ccollect/sources/windos-daten# echo /mnt/win/Daten > source
srwali01:/etc/ccollect/sources/windos-daten# ln -s /mnt/hdbackup/windos-daten destination
srwali01:/etc/ccollect/sources/windos-daten# mkdir /mnt/hdbackup/windos-daten
# Now add some remote source
srwali01:/etc/ccollect/sources/windos-daten# cd ..
srwali01:/etc/ccollect/sources# mkdir srwali03
srwali01:/etc/ccollect/sources# cd srwali03/
srwali01:/etc/ccollect/sources/srwali03# cat > exclude << EOF
> /proc
> /sys
> /mnt
> /home
> EOF
srwali01:/etc/ccollect/sources/srwali03# echo 'root@10.103.2.3:/' > source
srwali01:/etc/ccollect/sources/srwali03# ln -s /mnt/hdbackup/srwali03 destination
srwali01:/etc/ccollect/sources/srwali03# mkdir /mnt/hdbackup/srwali03
--------------------------------------------------------------------------------
Using hard-links requires less disk space
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------------------------------------------------------------------
# du (coreutils) 5.2.1
[10:53] srsyg01:sources% du -sh ~/backupdir
4.6M /home/nico/backupdir
[10:53] srsyg01:sources% du -sh ~/backupdir/*
4.1M /home/nico/backupdir/daily.2005-12-08-10:52.28456
4.1M /home/nico/backupdir/daily.2005-12-08-10:53.28484
4.1M /home/nico/backupdir/daily.2005-12-08-10:53.28507
4.1M /home/nico/backupdir/daily.2005-12-08-10:53.28531
4.1M /home/nico/backupdir/daily.2005-12-08-10:53.28554
4.1M /home/nico/backupdir/daily.2005-12-08-10:53.28577
srwali01:/etc/ccollect/sources# du -sh /mnt/hdbackup/wl6/
186M /mnt/hdbackup/wl6/
srwali01:/etc/ccollect/sources# du -sh /mnt/hdbackup/wl6/*
147M /mnt/hdbackup/wl6/taeglich.2005-12-08-14:42.312
147M /mnt/hdbackup/wl6/taeglich.2005-12-08-14:45.588
-------------------------------------------------------------------------
The backup of our main fileserver:
-------------------------------------------------------------------------
backup:~# df -h /home/backup/srsyg01/
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/backup--01-srsyg01
591G 451G 111G 81% /home/backup/srsyg01
backup:~# du -sh /home/backup/srsyg01/*
432G /home/backup/srsyg01/daily.2006-01-24-01:00.15990
432G /home/backup/srsyg01/daily.2006-01-26-01:00.30152
434G /home/backup/srsyg01/daily.2006-01-27-01:00.4596
435G /home/backup/srsyg01/daily.2006-01-28-01:00.11998
437G /home/backup/srsyg01/daily.2006-01-29-01:00.19115
437G /home/backup/srsyg01/daily.2006-01-30-01:00.26405
438G /home/backup/srsyg01/daily.2006-01-31-01:00.1148
439G /home/backup/srsyg01/daily.2006-02-01-01:00.8321
439G /home/backup/srsyg01/daily.2006-02-02-01:00.15383
439G /home/backup/srsyg01/daily.2006-02-03-01:00.22567
16K /home/backup/srsyg01/lost+found
backup:~# du --version | head -n1
du (coreutils) 5.2.1
-------------------------------------------------------------------------
Newer versions of du also detect the hardlinks, so we can even compare
the sizes directly with du:
-------------------------------------------------------------------------
[8:16] eiche:~# du --version | head -n 1
du (GNU coreutils) 5.93
[8:17] eiche:schwarzesloch# du -slh hydrogenium/*
19G hydrogenium/durcheinander.0
18G hydrogenium/durcheinander.2006-01-17-00:27.13820
19G hydrogenium/durcheinander.2006-01-25-23:18.31328
19G hydrogenium/durcheinander.2006-01-26-00:11.3332
[8:22] eiche:schwarzesloch# du -sh hydrogenium/*
19G hydrogenium/durcheinander.0
12G hydrogenium/durcheinander.2006-01-17-00:27.13820
1.5G hydrogenium/durcheinander.2006-01-25-23:18.31328
200M hydrogenium/durcheinander.2006-01-26-00:11.3332
-------------------------------------------------------------------------
In the second report (without -l) the sizes include the space the inodes of
the hardlinks allocate.
A collection of backups on the backup server
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All the data of my important hosts is backuped to eiche into
/mnt/schwarzesloch/backup:
-------------------------------------------------------------------------
[9:24] eiche:backup# ls *
creme:
woechentlich.2006-01-26-22:22.4153 woechentlich.2006-02-12-11:48.2461
woechentlich.2006-01-26-22:23.4180 woechentlich.2006-02-18-23:00.7898
woechentlich.2006-02-05-02:43.14281 woechentlich.2006-02-25-23:00.13480
woechentlich.2006-02-06-00:24.15509 woechentlich.2006-03-04-23:00.25439
hydrogenium:
durcheinander.2006-01-27-11:16.6391 durcheinander.2006-02-13-01:07.2895
durcheinander.2006-01-30-19:29.9505 durcheinander.2006-02-17-08:20.6707
durcheinander.2006-01-30-22:27.9623 durcheinander.2006-02-24-16:24.12461
durcheinander.2006-02-03-09:52.12885 durcheinander.2006-03-03-19:17.18075
durcheinander.2006-02-05-23:00.15068 durcheinander.2006-03-17-22:41.5007
scice:
woechentlich.2006-02-04-10:32.13766 woechentlich.2006-02-16-23:00.6185
woechentlich.2006-02-05-23:02.15093 woechentlich.2006-02-23-23:00.11783
woechentlich.2006-02-06-08:22.15994 woechentlich.2006-03-02-23:00.17346
woechentlich.2006-02-06-19:40.16321 woechentlich.2006-03-09-23:00.29317
woechentlich.2006-02-12-11:51.2514 woechentlich.2006-03-16-23:00.4218
-------------------------------------------------------------------------
And this incremental backup and the archive are copied to an external
usb harddisk (attention: you *should* really use -H to backup the backup):
-------------------------------------------------------------------------
[9:23] eiche:backup# df -h
Filesystem Size Used Avail Use% Mounted on
rootfs 14G 8.2G 4.9G 63% /
/dev/root 14G 8.2G 4.9G 63% /
/dev/root 14G 8.2G 4.9G 63% /dev/.static/dev
tmpfs 10M 444K 9.6M 5% /dev
/dev/hdh 29G 3.7M 29G 1% /mnt/datenklo
tmpfs 110M 4.0K 110M 1% /dev/shm
/dev/mapper/nirvana 112G 90G 23G 81% /mnt/datennirvana
/dev/mapper/schwarzes-loch
230G 144G 86G 63% /mnt/schwarzesloch
/dev/mapper/archiv 38G 20G 19G 52% /mnt/archiv
/dev/mapper/usb-backup
280G 36M 280G 1% /mnt/usb/backup
[9:24] eiche:backup# cat ~/bin/sync-to-usb
DDIR=/mnt/usb/backup
rsync -av -H --delete /mnt/schwarzesloch/ "$DDIR/schwarzesloch/"
rsync -av -H --delete /mnt/archiv/ "$DDIR/archiv/"
-------------------------------------------------------------------------
Processes running when doing ccollect -p
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Truncated output from `ps axuwwwf`:
-------------------------------------------------------------------------
S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily -p ddba034 ddba045 ddba046 ddba047 ddba049 ddna010 ddna011
S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba034
S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba034
R+ 11:40 23:40 | | | | | \_ rsync -a --delete --numeric-ids --relative --delete-excluded --link-dest=/home/server/backup/ddba034
S+ 11:40 0:00 | | | | | \_ ssh -l root ddba034.netstream.ch rsync --server --sender -vlogDtprR --numeric-ids . /
S+ 11:41 0:11 | | | | | \_ rsync -a --delete --numeric-ids --relative --delete-excluded --link-dest=/home/server/backup/ddb
S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba034
S+ 11:40 0:00 | | | | \_ sed s:^:\[ddba034\] :
S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba045
S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba045
R+ 11:40 0:02 | | | | | \_ rm -rf /etc/ccollect/sources/ddba045/destination/daily.2006-10-19-1807.6934
S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba045
S+ 11:40 0:00 | | | | \_ sed s:^:\[ddba045\] :
S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba046
S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba046
R+ 11:40 0:02 | | | | | \_ rm -rf /etc/ccollect/sources/ddba046/destination/daily.2006-10-19-1810.7072
S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba046
S+ 11:40 0:00 | | | | \_ sed s:^:\[ddba046\] :
S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba047
S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba047
R+ 11:40 0:03 | | | | | \_ rm -rf /etc/ccollect/sources/ddba047/destination/daily.2006-10-19-1816.7268
S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba047
S+ 11:40 0:00 | | | | \_ sed s:^:\[ddba047\] :
S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba049
S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba049
D+ 11:40 0:03 | | | | | \_ rm -rf /etc/ccollect/sources/ddba049/destination/daily.2006-10-19-1821.7504
S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba049
S+ 11:40 0:00 | | | | \_ sed s:^:\[ddba049\] :
S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddna010
S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddna010
R+ 11:40 0:03 | | | | | \_ rm -rf /etc/ccollect/sources/ddna010/destination/daily.2006-10-19-1805.6849
S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddna010
S+ 11:40 0:00 | | | | \_ sed s:^:\[ddna010\] :
S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddna011
S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddna011
R+ 12:08 0:00 | | | | \_ rm -rf /etc/ccollect/sources/ddna011/destination/daily.2006-10-20-1502.7824
S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddna011
S+ 11:40 0:00 | | | \_ sed s:^:\[ddna011\] :
-------------------------------------------------------------------------
As you can see, six processes are deleting old backups, while one backup
(ddba034) is already copying data.