Revision History | ||
---|---|---|
Revision 0.7.1 | for ccollect 0.7.1, Initial Version from 2006-01-13 | NS |
Table of Contents
(pseudo) incremental backup
with different exclude lists
using hardlinks and rsync
ccollect
is a backup utility written in the sh-scripting language.
It does not depend on a specific shell, only /bin/sh
needs to be
bourne shell compatible (like dash, ksh, zsh, bash, …).
ccollect
was successfully tested on the following platforms:
It should run on any Unix that supports rsync
and has a POSIX-compatible
bourne shell. If your platform is not listed above and you have it successfully
running, please drop me a mail.
While considering the design of ccollect, I thought about enabling backup to remote hosts. Though this sounds like a nice feature (Backup my notebook to the server now.), in my opinion it is a bad idea to backup to a remote host.
But as more and more people requested this feature, it was implemented, so you have the choice whether you want to use it or not.
If you want to backup TO a remote host, you have to loosen security on it.
Imagine the following situation: You backup your farm of webservers TO a backup host somewhere else. Now one of your webservers which has access to your backup host gets compromised.
Your backup server will be compromised, too.
And the attacker will have access to all data on the other webservers.
Think of it the other way round: The backup server (now behind a firewall, not accessable from outside) connects to the webservers and pulls the data from them. If someone gets access to one of the webservers, this person will perhaps not even see your machine. If the attacker sees connections from a host to the compromised machine, she will not be able to log in on the backup machine. All other backups are still secure.
The format of destination
changed:
You can update your configuration using tools/config-pre-0.7-to-0.7.sh
.
Added remote_host
The format of rsync_options
changed:
rsync
)
You can update your configuration using tools/config-pre-0.6-to-0.6.sh
.
The name of the backup directories changed:
For the second change there is no updated needed, as XXXX- is always before XXXXX (- comes before digit).
Not a real incompatibilty, but seems to fit in this section:
0.5 does NOT require
anymore!
Since ccollect
0.4 there are several incompatibilities with earlier
versions:
List of incompatibilities
pax
(Posix) is now required, cp -al
(GNU specific) is removed
You can convert your old configuration directory using
config-pre-0.4-to-0.4.sh
, which can be found in the tools/
subdirectory:
[10:05] hydrogenium:ccollect-0.4# ./tools/config-pre-0.4-to-0.4.sh /etc/ccollect
For those who do not want to read the whole long document:
# get latest ccollect tarball from http://www.nico.schottelius.org/software/ccollect/ # replace value for CCV with the current version export CCV=0.7.1 # # replace 'wget' with 'fetch' on bsd # holen=wget "$holen" http://www.nico.schottelius.org/software/ccollect/ccollect-${CCV}.tar.bz2 # extract the tarball, change to the newly created directory tar -xvjf ccollect-${CCV}.tar.bz2 cd ccollect-${CCV} # create mini-configuration # first create directory structure mkdir -p miniconfig/defaults/intervals mkdir miniconfig/sources # create sample intervals echo 2 > miniconfig/defaults/intervals/testinterval echo 3 > miniconfig/defaults/intervals/testinterval2 # create destination directory, where the backups will be kept mkdir ~/DASI # create sample source, which will be saved mkdir miniconfig/sources/testsource # We will save '/bin' to the directory '~/DASI' echo '/bin' > miniconfig/sources/testsource/source # configure ccollect to use ~/DASI as destination echo ~/DASI > miniconfig/sources/testsource/destination # We want to see what happens and also a small summary at the end touch miniconfig/sources/testsource/verbose touch miniconfig/sources/testsource/summary echo "do the backup, twice" CCOLLECT_CONF=./miniconfig ./ccollect.sh testinterval testsource CCOLLECT_CONF=./miniconfig ./ccollect.sh testinterval testsource echo "the third time ccollect begins to remove old backups" echo -n "Hit enter to see it" read CCOLLECT_CONF=./miniconfig ./ccollect.sh testinterval testsource echo "Now we add another interval, ccollect should clone from existent ones" echo -n "Hit enter to see it" read CCOLLECT_CONF=./miniconfig ./ccollect.sh testinterval2 testsource echo "Let's see how much space we used with two backups and compare it to /bin" du -s ~/DASI /bin # report success echo "Please report success using ./tools/report_success.sh"
Cutting and pasting the complete section above to your shell will result in the download of ccollect, the creation of a sample configuration and the execution of some backups.
For the installation you need at least
cp
and chmod
or install
make
asciidoc
Either type make install or simply copy it to a directory in your $PATH and execute chmod 0755 /path/to/ccollect.sh. If you would like to use the new management scripts (available since 0.6), copy the following scripts to a directory in $PATH:
tools/ccollect_add_source.sh
tools/ccollect_analyse_logs.sh.sh
tools/ccollect_delete_source.sh
tools/ccollect_list_intervals.sh
tools/ccollect_logwrapper.sh
After having installed and used ccollect, report success using ./tools/report_success.sh.
For configuration aid have a look at the above mentioned tools, which can assist
you quite well. When you are successfully using ccollect
, report success using
tools/report_success.sh
.
ccollect
looks for its configuration in /etc/ccollect or, if set, in
the directory specified by the variable $CCOLLECT_CONF:
# sh-compatible (dash, zsh, mksh, ksh, bash, ...) $ CCOLLECT_CONF=/your/config/dir ccollect.sh ... # csh $ ( setenv CCOLLECT_CONF /your/config/dir ; ccollect.sh ... )
When you start ccollect
, you have to specify in which interval
to backup (daily, weekly, yearly; you can specify the names yourself, see below)
and which sources to backup (or -a to backup all sources).
The interval specifies how many backups are kept.
There are also some self-explanatory parameters you can pass to ccollect,
simply use ccollect.sh --help
for info.
The general configuration can be found in $CCOLLECT_CONF/defaults or /etc/ccollect/defaults. All options specified there are generally valid for all source definitions, although the values can be overwritten in the source configuration.
All configuration entries are plain-text files (use UTF-8 for non-ascii characters).
The interval definition can be found in
$CCOLLECT_CONF/defaults/intervals/ or /etc/ccollect/defaults/intervals.
Each file in this directory specifies an interval. The name of the file is
the same as the name of the interval: intervals/'<interval name>'
.
The content of this file should be a single line containing a number. This number defines how many versions of this interval are kept.
Example:
[10:23] zaphodbeeblebrox:ccollect-0.2% ls -l conf/defaults/intervals/ insgesamt 12 -rw-r--r-- 1 nico users 3 2005-12-08 10:24 daily -rw-r--r-- 1 nico users 3 2005-12-08 11:36 monthly -rw-r--r-- 1 nico users 2 2005-12-08 11:36 weekly [10:23] zaphodbeeblebrox:ccollect-0.2% cat conf/defaults/intervals/* 28 12 4
This means to keep 28 daily backups, 12 monthly backups and 4 weekly.
If you add $CCOLLECT_CONF/defaults/pre_exec
or
/etc/ccollect/defaults/pre_exec
(same with post_exec
), ccollect
will start pre_exec
before the whole backup process and
post_exec
after backup of all sources is done.
The following example describes how to report free disk space in human readable format before and after the whole backup process:
[13:00] hydrogenium:~# mkdir -p /etc/ccollect/defaults/ [13:00] hydrogenium:~# echo '#!/bin/sh' > /etc/ccollect/defaults/pre_exec [13:01] hydrogenium:~# echo '' >> /etc/ccollect/defaults/pre_exec [13:01] hydrogenium:~# echo 'df -h' >> /etc/ccollect/defaults/pre_exec [13:01] hydrogenium:~# chmod 0755 /etc/ccollect/defaults/pre_exec [13:01] hydrogenium:~# ln -s /etc/ccollect/defaults/pre_exec /etc/ccollect/defaults/post_exec
Each source configuration exists in $CCOLLECT_CONF/sources/$name or /etc/ccollect/sources/$name.
The name you choose for the subdirectory describes the source.
Each source contains at least the following files:
source
(a text file containing the rsync
compatible path to backup)
destination
(a text file containing the directory we should backup to)
Additionally a source may have the following files:
verbose
whether to be verbose (passes -v to rsync
)
very_verbose
be very verbose (mkdir -v
, rm -v
and rsync -vv
)
summary
create a transfer summary when rsync
finished
exclude
exclude list for rsync
. newline seperated list.
rsync_options
extra options for rsync
. newline seperated list.
pre_exec
program to execute before backing up this source
post_exec
program to execute after backing up this source
delete_incomplete
delete incomplete backups
remote_host
host to backup to
Example:
[10:47] zaphodbeeblebrox:ccollect-0.2% ls -l conf/sources/testsource2 insgesamt 12 lrwxrwxrwx 1 nico users 20 2005-11-17 16:44 destination -> /home/nico/backupdir -rw-r--r-- 1 nico users 62 2005-12-07 17:43 exclude drwxr-xr-x 2 nico users 4096 2005-12-07 17:38 intervals -rw-r--r-- 1 nico users 15 2005-11-17 16:44 source [10:47] zaphodbeeblebrox:ccollect-0.2% cat conf/sources/testsource2/exclude openvpn-2.0.1.tar.gz nicht_reinnehmen etwas mit leerzeichenli [10:47] zaphodbeeblebrox:ccollect-0.2% ls -l conf/sources/testsource2/intervals insgesamt 4 -rw-r--r-- 1 nico users 2 2005-12-07 17:38 daily [10:48] zaphodbeeblebrox:ccollect-0.2% cat conf/sources/testsource2/intervals/daily 5 [10:48] zaphodbeeblebrox:ccollect-0.2% cat conf/sources/testsource2/source /home/nico/vpn
source
describes a rsync
compatible source (one line only).
For instance backup_user@foreign_host:/home/server/video.
To use the rsync
protocol without the ssh
-tunnel, use
rsync::USER@HOST/SRC. For more information have a look at the manpage
of rsync
(1).
destination
must be a text file containing the destination directory.
destination
USED to be a link to the destination directory in
earlier versions, so do not be confused if you see such examples.
Example:
[11:36] zaphodbeeblebrox:ccollect-0.2% cat conf/sources/testsource2/destination /home/nico/backupdir
remote_host
must be a text file containing the destination host.
If this file is existing, you are backing up your data TO this host
and not to you local host.
Warning: You need to have ssh
access to the remote host. rsync
and
ccollect
will connect to that host via ssh
. ccollect
needs the shell
access, because it needs to find out how many backups exist on the remote
host and to be able to delete them.
Example:
[10:17] denkbrett:ccollect-0.7.0% cat conf/sources/remote1/remote_host home.schottelius.org
It may contain all the ssh-specific values like myuser@yourhost.ch.
verbose
tells ccollect
that the log should contain verbose messages.
If this file exists in the source specification -v will be passed to rsync
.
` Example:
[11:35] zaphodbeeblebrox:ccollect-0.2% touch conf/sources/testsource1/verbose
very_verbose
tells ccollect
that it should log very verbosely.
If this file exists in the source specification -v will be passed to
rsync
, rm
and mkdir
.
Example:
[23:67] nohost:~% touch conf/sources/testsource1/very_verbose
If you create the file summary
in the source definition,
ccollect
will present you a nice summary at the end.
backup:~# touch /etc/ccollect/sources/root/summary backup:~# ccollect.sh werktags root ==> ccollect.sh: Beginning backup using interval werktags <== [root] Beginning to backup this source ... [root] Currently 3 backup(s) exist, total keeping 50 backup(s). [root] Beginning to backup, this may take some time... [root] Hard linking... [root] Transferring files... [root] [root] Number of files: 84183 [root] Number of files transferred: 32 [root] Total file size: 26234080536 bytes [root] Total transferred file size: 9988252 bytes [root] Literal data: 9988252 bytes [root] Matched data: 0 bytes [root] File list size: 3016771 [root] File list generation time: 1.786 seconds [root] File list transfer time: 0.000 seconds [root] Total bytes sent: 13009119 [root] Total bytes received: 2152 [root] [root] sent 13009119 bytes received 2152 bytes 2891393.56 bytes/sec [root] total size is 26234080536 speedup is 2016.26 [root] Successfully finished backup. ==> Finished ccollect.sh <==
You could also combine it with verbose
or very_verbose
, but these
already print some statistics (though not all / the same as presented by
summary
).
exclude
specifies a list of paths to exclude. The entries are seperated by a newline (\n).
Example:
[11:35] zaphodbeeblebrox:ccollect-0.2% cat conf/sources/testsource2/exclude openvpn-2.0.1.tar.gz nicht_reinnehmen etwas mit leerzeichenli something with spaces is not a problem
When you create the subdirectory intervals/
in your source configuration
directory, you can specify individiual intervals for this specific source.
Each file in this directory describes an interval.
Example:
[11:37] zaphodbeeblebrox:ccollect-0.2% ls -l conf/sources/testsource2/intervals/ insgesamt 8 -rw-r--r-- 1 nico users 2 2005-12-07 17:38 daily -rw-r--r-- 1 nico users 3 2005-12-14 11:33 yearly [11:37] zaphodbeeblebrox:ccollect-0.2% cat conf/sources/testsource2/intervals/* 5 20
When you create the file rsync_options
in your source configuration,
all the parameters in this file will be passed to rsync. This
way you can pass additional options to rsync. For instance you can tell rsync
to show progress ("--progress"), or which -password-file ("--password-file")
to use for automatic backup over the rsync-protocol.
Example:
[23:42] hydrogenium:ccollect-0.2% cat conf/sources/test_rsync/rsync_options --password-file=/home/user/backup/protected_password_file
When you create pre_exec
and / or post_exec
in your source
configuration, ccollect
will execute this command before and
respectively after doing the backup for this specific source.
If you want to have pre-/post-exec before and after all
backups, see above for general configuration.
Example:
[13:09] hydrogenium:ccollect-0.3% cat conf/sources/with_exec/pre_exec #!/bin/sh # Show whats free before df -h [13:09] hydrogenium:ccollect-0.3% cat conf/sources/with_exec/post_exec #!/bin/sh # Show whats free after df -h
Since ccollect-0.6.1 you can use the ccollect-logwrapper.sh(1) for logging. You call it the same way you call ccollect.sh and it will create a logfile containing the output of ccollect.sh. For more information look at the manpage ccollect-logwrapper. The following is an example running ccollect-logwrapper.sh:
u0219 ~ # ~chdscni9/ccollect-logwrapper.sh daily u0160.nshq.ch.netstream.com ccollect-logwrapper.sh (11722): Starting with arguments: daily u0160.nshq.ch.netstream.com ccollect-logwrapper.sh (11722): Finished.
Mostly easy is to use your ~/.ssh/config file:
host mx2.schottelius.org Port 2342
If you only use that port for backup only and normally want to use another port, you can add HostName and "HostKeyAlias" (if you also have different keys on the different ports):
Host hhydrogenium Hostname bruehe.schottelius.org Port 666 HostKeyAlias hydrogenium Host bruehe Hostname bruehe.schottelius.org Port 22 HostKeyAlias bruehe.schottelius.org
The pre-/post_exec scripts can access some internal variables from ccollect
:
When you have a computer with little computing power, it may be useful to use
rsync without ssh, directly using the rsync protocol
(specify user@host::share in source
). You may wish to use
rsync_options
to specify a password file to use for automatic backup.
Example:
backup:~# cat /etc/ccollect/sources/sample.backup.host.org/source backup@webserver::backup-share backup:~# cat /etc/ccollect/sources/sample.backup.host.org/rsync_options --password-file=/etc/ccollect/sources/sample.backup.host.org/rsync_password backup:~# cat /etc/ccollect/sources/sample.backup.host.org/rsync_password this_is_the_rsync_password
This hint was reported by Daniel Aubry.
When you exclude "/proc" or "/mnt" from your backup, you may run into
trouble when you restore your backup. When you use "/proc/*" or "/mnt/\*"
instead, ccollect
will backup empty directories.
When those directories contain hidden files (those beginning with a dot (.)), they will still be transferred!
This hint was reported by Marcus Wagner.
If you used rsync
directly before you use ccollect
, you can
use this old backup as initial backup for ccollect
: You
simply move it into a directory below the destination directory
and name it "interval.0".
Example:
backup:/home/backup/web1# ls bin dev etc initrd lost+found mnt root srv usr vmlinuz boot doc home lib media opt sbin tmp var vmlinuz.old backup:/home/backup/web1# mkdir daily.0 # ignore error about copying to itself backup:/home/backup/web1# mv * daily.0 2>/dev/null backup:/home/backup/web1# ls daily.0
Now you can use /home/backup/web1 as the destination
for the backup.
It does not matter anymore how you name your directory, as ccollect
uses
the -c option from ls
to find out which directory to clone from.
Older versions (pre 0.6, iirc) had a problem, if you named the first backup
something like "daily.initial". It was needed to use the "0" (or some
number that is lower than the current year) as extension. ccollect
used sort
to find the latest backup. ccollect
itself uses
interval.YEARMONTHDAY-HOURMINUTE.PID. This notation was always before
"daily.initial", as numbers are earlier in the list
which is produced by sort
. So, if you had a directory named "daily.initial",
ccollect
always diffed against this backup and transfered and deleted
files which where deleted in previous backups. This means you simply
wasted resources, but your backup had beer complete anyway.
Your pre_/post_exec script does not need to be a script, you can also use a link to
The only requirement is that it is executable.
When you are backing up multiple hosts via cron each night, it may be
a problem that host "big_server" may only have 4 daily backups, because
otherwise its backup device will be full. But for all other hosts
you want to keep 20 daily backups. In this case you would create
/etc/ccollect/default/intervals/daily
containing "20" and
/etc/ccollect/sources/big_server/intervals/daily
containing "4".
Source specific intervals always overwrite the default values.
If you have to specify it individually for every host, because
of different requirements, you can even omit creating
/etc/ccollect/default/intervals/daily
.
If you want to see what changed between two backups, you can use
rsync
directly:
[12:00] u0255:ddba034.netstream.ch# rsync -n -a --delete --stats --progress daily.20080324-0313.17841/ daily.20080325-0313.31148/
This results in a listing of changes. Because we pass -n to rsync no transfer is made (i.e. report only mode)"
This hint was reported by Daniel Aubry.
If you want to test whether the host you try to backup is reachable, you can use the following script as source specific pre-exec:
#!/bin/sh # ping -c1 -q `cat "/etc/ccollect/sources/$name/source" | cut -d"@" -f2 | cut -d":" -f1`
This prevents the deletion of old backups, if the host is not reachable.
This hint was reported by Daniel Aubry.
Let us assume that one backup failed (connection broke or the source hard disk had some failures). Therefore we’ve got one incomplete backup in our history.
ccollect
will transfer the missing files the next time you use it.
This leads to
If the whole ccollect
process was interrupted, ccollect
(since 0.6) can
detect that and remove the incomplete backups, so you can clone from a complete
backup instead
No. ccollect
passes your source definition directly to rsync
. It
does not try to analyze it. So it actually does not know if a source
comes from local harddisk or from a remote server. And it does not want
to. When you backup from the local harddisk (which is perhaps not
even a good idea when thinking of security), add the destination
to source/exclude. (Daniel Aubry reported this problem)
The most common error is that you have not given your script the correct
permissions. Try chmod 0755 /etc/ccollect/sources/'yoursource'/*_exec
`.
When a part of your path you specified in the source is a (symbolic, hard links are not possible for directories) link, the backup must fail.
First of all, let us have a look at how it looks like:
==> ccollect 0.4: Beginning backup using interval taeglich <== [testsource] Sa Apr 29 00:01:55 CEST 2006 Beginning to backup [testsource] Currently 0 backup(s) exist(s), total keeping 10 backup(s). [testsource] Beginning to backup, this may take some time... [testsource] Creating /etc/ccollect/sources/testsource/destination/taeglich.2006-04-29-0001.3874 ... [testsource] Sa Apr 29 00:01:55 CEST 2006 Transferring files... [testsource] rsync: recv_generator: mkdir "/etc/ccollect/sources/testsource/destination/taeglich.2006-04-29-0001.3874/home/user/nico/projekte/ccollect" failed: No such file or directory (2) [testsource] rsync: stat "/etc/ccollect/sources/testsource/destination/taeglich.2006-04-29-0001.3874/home/user/nico/projekte/ccollect" failed: No such file or directory (2) [...]
So what is the problem? It is very obvious when you look deeper into it:
% cat /etc/ccollect/sources/testsource/source /home/user/nico/projekte/ccollect/ccollect-0.4 % ls -l /home/user/nico/projekte lrwxrwxrwx 1 nico nico 29 2005-12-02 23:28 /home/user/nico/projekte -> oeffentlich/computer/projekte % ls -l /etc/ccollect/sources/testsource/destination/taeglich.2006-04-29-0001.3874/home/user/nico lrwxrwxrwx 1 nico nico 29 2006-04-29 00:01 projekte -> oeffentlich/computer/projekte
rsync
creates the directory structure before it creates the symbolic link.
This link now links to something not reachable (dead link). It is
impossible to create subdirectories under the broken link.
In conclusion you cannot use paths with a linked part.
However, you can backup directories containing symbolic links (in this case you could backup /home/user/nico, which contains /home/user/nico/projekte and oeffentlich/computer/projekte).
As ccollect
first deletes the old backups, it may take some time
until rsync
requests the password for the ssh
session from you.
The easiest way not to miss that point is running ccollect
in screen
,
which has the ability to monitor the output for activity. So as soon as
your screen beeps, after ccollect
began to remove the last directory,
you can enter your password (have a look at screen(1), especially "C-a M"
and "C-a _", for more information).
srwali01:~# mkdir /etc/ccollect srwali01:~# mkdir -p /etc/ccollect/defaults/intervals/ srwali01:~# echo 28 > /etc/ccollect/defaults/intervals/taeglich srwali01:~# echo 52 > /etc/ccollect/defaults/intervals/woechentlich srwali01:~# cd /etc/ccollect/ srwali01:/etc/ccollect# mkdir sources srwali01:/etc/ccollect# cd sources/ srwali01:/etc/ccollect/sources# ls srwali01:/etc/ccollect/sources# mkdir local-root srwali01:/etc/ccollect/sources# cd local-root/ srwali01:/etc/ccollect/sources/local-root# echo / > source srwali01:/etc/ccollect/sources/local-root# cat > exclude << EOF > /proc > /sys > /mnt > EOF srwali01:/etc/ccollect/sources/local-root# echo /mnt/hdbackup/local-root > destination srwali01:/etc/ccollect/sources/local-root# mkdir /mnt/hdbackup/local-root srwali01:/etc/ccollect/sources/local-root# ccollect.sh taeglich local-root /o> ccollect.sh: Beginning backup using interval taeglich /=> Beginning to backup "local-root" ... |-> 0 backup(s) already exist, keeping 28 backup(s).
After that, I added some more sources:
srwali01:~# cd /etc/ccollect/sources srwali01:/etc/ccollect/sources# mkdir windos-wl6 srwali01:/etc/ccollect/sources# cd windos-wl6/ srwali01:/etc/ccollect/sources/windos-wl6# echo /mnt/win/SYS/WL6 > source srwali01:/etc/ccollect/sources/windos-wl6# ln -s /mnt/hdbackup/wl6 destination srwali01:/etc/ccollect/sources/windos-wl6# mkdir /mnt/hdbackup/wl6 srwali01:/etc/ccollect/sources/windos-wl6# cd .. srwali01:/etc/ccollect/sources# mkdir windos-daten srwali01:/etc/ccollect/sources/windos-daten# echo /mnt/win/Daten > source srwali01:/etc/ccollect/sources/windos-daten# ln -s /mnt/hdbackup/windos-daten destination srwali01:/etc/ccollect/sources/windos-daten# mkdir /mnt/hdbackup/windos-daten # Now add some remote source srwali01:/etc/ccollect/sources/windos-daten# cd .. srwali01:/etc/ccollect/sources# mkdir srwali03 srwali01:/etc/ccollect/sources# cd srwali03/ srwali01:/etc/ccollect/sources/srwali03# cat > exclude << EOF > /proc > /sys > /mnt > /home > EOF srwali01:/etc/ccollect/sources/srwali03# echo 'root@10.103.2.3:/' > source srwali01:/etc/ccollect/sources/srwali03# echo /mnt/hdbackup/srwali03 > destination srwali01:/etc/ccollect/sources/srwali03# mkdir /mnt/hdbackup/srwali03
# du (coreutils) 5.2.1 [10:53] srsyg01:sources% du -sh ~/backupdir 4.6M /home/nico/backupdir [10:53] srsyg01:sources% du -sh ~/backupdir/* 4.1M /home/nico/backupdir/daily.2005-12-08-10:52.28456 4.1M /home/nico/backupdir/daily.2005-12-08-10:53.28484 4.1M /home/nico/backupdir/daily.2005-12-08-10:53.28507 4.1M /home/nico/backupdir/daily.2005-12-08-10:53.28531 4.1M /home/nico/backupdir/daily.2005-12-08-10:53.28554 4.1M /home/nico/backupdir/daily.2005-12-08-10:53.28577 srwali01:/etc/ccollect/sources# du -sh /mnt/hdbackup/wl6/ 186M /mnt/hdbackup/wl6/ srwali01:/etc/ccollect/sources# du -sh /mnt/hdbackup/wl6/* 147M /mnt/hdbackup/wl6/taeglich.2005-12-08-14:42.312 147M /mnt/hdbackup/wl6/taeglich.2005-12-08-14:45.588
The backup of our main fileserver:
backup:~# df -h /home/backup/srsyg01/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/backup--01-srsyg01 591G 451G 111G 81% /home/backup/srsyg01 backup:~# du -sh /home/backup/srsyg01/* 432G /home/backup/srsyg01/daily.2006-01-24-01:00.15990 432G /home/backup/srsyg01/daily.2006-01-26-01:00.30152 434G /home/backup/srsyg01/daily.2006-01-27-01:00.4596 435G /home/backup/srsyg01/daily.2006-01-28-01:00.11998 437G /home/backup/srsyg01/daily.2006-01-29-01:00.19115 437G /home/backup/srsyg01/daily.2006-01-30-01:00.26405 438G /home/backup/srsyg01/daily.2006-01-31-01:00.1148 439G /home/backup/srsyg01/daily.2006-02-01-01:00.8321 439G /home/backup/srsyg01/daily.2006-02-02-01:00.15383 439G /home/backup/srsyg01/daily.2006-02-03-01:00.22567 16K /home/backup/srsyg01/lost+found backup:~# du --version | head -n1 du (coreutils) 5.2.1
Newer versions of du also detect the hardlinks, so we can even compare the sizes directly with du:
[8:16] eiche:~# du --version | head -n 1 du (GNU coreutils) 5.93 [8:17] eiche:schwarzesloch# du -slh hydrogenium/* 19G hydrogenium/durcheinander.0 18G hydrogenium/durcheinander.2006-01-17-00:27.13820 19G hydrogenium/durcheinander.2006-01-25-23:18.31328 19G hydrogenium/durcheinander.2006-01-26-00:11.3332 [8:22] eiche:schwarzesloch# du -sh hydrogenium/* 19G hydrogenium/durcheinander.0 12G hydrogenium/durcheinander.2006-01-17-00:27.13820 1.5G hydrogenium/durcheinander.2006-01-25-23:18.31328 200M hydrogenium/durcheinander.2006-01-26-00:11.3332
In the second report (without -l) the sizes include the space the inodes of the hardlinks allocate.
All the data of my important hosts is backuped to eiche into /mnt/schwarzesloch/backup:
[9:24] eiche:backup# ls * creme: woechentlich.2006-01-26-22:22.4153 woechentlich.2006-02-12-11:48.2461 woechentlich.2006-01-26-22:23.4180 woechentlich.2006-02-18-23:00.7898 woechentlich.2006-02-05-02:43.14281 woechentlich.2006-02-25-23:00.13480 woechentlich.2006-02-06-00:24.15509 woechentlich.2006-03-04-23:00.25439 hydrogenium: durcheinander.2006-01-27-11:16.6391 durcheinander.2006-02-13-01:07.2895 durcheinander.2006-01-30-19:29.9505 durcheinander.2006-02-17-08:20.6707 durcheinander.2006-01-30-22:27.9623 durcheinander.2006-02-24-16:24.12461 durcheinander.2006-02-03-09:52.12885 durcheinander.2006-03-03-19:17.18075 durcheinander.2006-02-05-23:00.15068 durcheinander.2006-03-17-22:41.5007 scice: woechentlich.2006-02-04-10:32.13766 woechentlich.2006-02-16-23:00.6185 woechentlich.2006-02-05-23:02.15093 woechentlich.2006-02-23-23:00.11783 woechentlich.2006-02-06-08:22.15994 woechentlich.2006-03-02-23:00.17346 woechentlich.2006-02-06-19:40.16321 woechentlich.2006-03-09-23:00.29317 woechentlich.2006-02-12-11:51.2514 woechentlich.2006-03-16-23:00.4218
And this incremental backup and the archive are copied to an external usb harddisk (attention: you should really use -H to backup the backup):
[9:23] eiche:backup# df -h Filesystem Size Used Avail Use% Mounted on rootfs 14G 8.2G 4.9G 63% / /dev/root 14G 8.2G 4.9G 63% / /dev/root 14G 8.2G 4.9G 63% /dev/.static/dev tmpfs 10M 444K 9.6M 5% /dev /dev/hdh 29G 3.7M 29G 1% /mnt/datenklo tmpfs 110M 4.0K 110M 1% /dev/shm /dev/mapper/nirvana 112G 90G 23G 81% /mnt/datennirvana /dev/mapper/schwarzes-loch 230G 144G 86G 63% /mnt/schwarzesloch /dev/mapper/archiv 38G 20G 19G 52% /mnt/archiv /dev/mapper/usb-backup 280G 36M 280G 1% /mnt/usb/backup [9:24] eiche:backup# cat ~/bin/sync-to-usb DDIR=/mnt/usb/backup rsync -av -H --delete /mnt/schwarzesloch/ "$DDIR/schwarzesloch/" rsync -av -H --delete /mnt/archiv/ "$DDIR/archiv/"
Truncated output from ps axuwwwf
:
S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily -p ddba034 ddba045 ddba046 ddba047 ddba049 ddna010 ddna011 S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba034 S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba034 R+ 11:40 23:40 | | | | | \_ rsync -a --delete --numeric-ids --relative --delete-excluded --link-dest=/home/server/backup/ddba034 S+ 11:40 0:00 | | | | | \_ ssh -l root ddba034.netstream.ch rsync --server --sender -vlogDtprR --numeric-ids . / S+ 11:41 0:11 | | | | | \_ rsync -a --delete --numeric-ids --relative --delete-excluded --link-dest=/home/server/backup/ddb S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba034 S+ 11:40 0:00 | | | | \_ sed s:^:\[ddba034\] : S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba045 S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba045 R+ 11:40 0:02 | | | | | \_ rm -rf /etc/ccollect/sources/ddba045/destination/daily.2006-10-19-1807.6934 S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba045 S+ 11:40 0:00 | | | | \_ sed s:^:\[ddba045\] : S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba046 S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba046 R+ 11:40 0:02 | | | | | \_ rm -rf /etc/ccollect/sources/ddba046/destination/daily.2006-10-19-1810.7072 S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba046 S+ 11:40 0:00 | | | | \_ sed s:^:\[ddba046\] : S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba047 S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba047 R+ 11:40 0:03 | | | | | \_ rm -rf /etc/ccollect/sources/ddba047/destination/daily.2006-10-19-1816.7268 S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba047 S+ 11:40 0:00 | | | | \_ sed s:^:\[ddba047\] : S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba049 S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba049 D+ 11:40 0:03 | | | | | \_ rm -rf /etc/ccollect/sources/ddba049/destination/daily.2006-10-19-1821.7504 S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddba049 S+ 11:40 0:00 | | | | \_ sed s:^:\[ddba049\] : S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddna010 S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddna010 R+ 11:40 0:03 | | | | | \_ rm -rf /etc/ccollect/sources/ddna010/destination/daily.2006-10-19-1805.6849 S+ 11:40 0:00 | | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddna010 S+ 11:40 0:00 | | | | \_ sed s:^:\[ddna010\] : S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddna011 S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddna011 R+ 12:08 0:00 | | | | \_ rm -rf /etc/ccollect/sources/ddna011/destination/daily.2006-10-20-1502.7824 S+ 11:40 0:00 | | | \_ /bin/sh /usr/local/bin/ccollect.sh daily ddna011 S+ 11:40 0:00 | | | \_ sed s:^:\[ddna011\] :
As you can see, six processes are deleting old backups, while one backup (ddba034) is already copying data.