Dear Nico Schottelius, I have started using ccollect and I very much like its design: it is elegant and effective. In the process of getting ccollect setup and running, I made five changes, including one major new feature, that I hope you will find useful. First, I added the following before any old backup gets deleted: > # Verify source is up and accepting connections before deleting any old backups > rsync "$source" >/dev/null || _exit_err "Source ${source} is not readable. Skipping." I think that this quick test is a much better than, say, pinging the source in a pre-exec script: this tests not only that the source is up and connected to the net, it also verifies (1) that ssh is up and accepting our key (if we are using ssh), and (2) that the source directory is mounted (if it needs to be mounted) and readable. Second, I found ccollect's use of ctime problematic. After copying an old backup over to my ccollect destination, I adjusted mtime and atime where needed using touch, e.g.: touch -d"28 Apr 2009 3:00" destination/daily.01 However, as far as I know, there is no way to correct a bad ctime. I ran into this issue repeatedly while adjusting my backup configuration. (For example, "cp -a" preserves mtime but not ctime. Even worse, "cp -al old new" also changes ctime on old.) Another potential problem with ctime is that it is file-system dependent: I have read that Windows sets ctime to create-time not last change-time. However, It is simple to give a new backup the correct mtime. After the rsync step, I added the command: 553a616,617 > # Correct the modification time: > pcmd touch "${destination_dir}" Even if ccollect continues to use ctime for sorting, I see no reason not to have the backup directory have the correct mtime. To allow the rest of the code to use either ctime or mtime, I added definitions: 44a45,47 > #TSORT="tc" ; NEWER="cnewer" > TSORT="t" ; NEWER="newer" (It would be better if this choice was user-configurable because those with existing backup directories should continue to use ctime until the mtimes of their directories are correct. The correction would happen passively over time as new backups created using the above touch command and the old ones are deleted.) With these definitions, the proper link-dest directory can then be found using this minor change (and comment update): 516,519c579,582 < # Use ls -1c instead of -1t, because last modification maybe the same on all < # and metadate update (-c) is updated by rsync locally. < # < last_dir="$(pcmd ls -tcp1 "${ddir}" | grep '/$' | head -n 1)" || \ --- > # Depending on your file system, you may want to sort on: > # 1. mtime (modification time) with TSORT=t, or > # 2. ctime (last change time, usually) with TSORT=tc > last_dir="$(pcmd ls -${TSORT}p1 "${ddir}" | grep '/$' | head -n 1)" || \ Thirdly, after I copied my old backups over to my ccollect destination directory, I found that ccollect would delete a recent backup not an old backup! My problem was that, unknown to me, the algorithm to find the oldest backup (for deletion) was inconsistent with that used to find the newest (for link-dest). I suggest that these two should be consistent. Because time-sorting seemed more consistent with the ccollect documentation, I suggest: 492,493c555,556 < pcmd ls -p1 "$ddir" | grep "^${INTERVAL}\..*/\$" | \ < sort -n | head -n "${remove}" > "${TMP}" || \ --- > pcmd ls -${TSORT}p1r "$ddir" | grep "^${INTERVAL}\..*/\$" | \ > head -n "${remove}" > "${TMP}" || \ Fourthly, in my experience, rsync error code 12 means complete failure, usually because the source refuses the ssh connection. So, I left the marker in that case: 558,559c622,625 < pcmd rm "${destination_dir}.${c_marker}" || \ < _exit_err "Removing ${destination_dir}/${c_marker} failed." --- > if [ "$ret" -ne 12 ] ; then > pcmd rm "${destination_dir}.${c_marker}" || \ > _exit_err "Removing ${destination_dir}/${c_marker} failed." > fi (A better solution might allow a user-configurable list of error codes that are treated the same as a fail.) Fifth, because I was frustrated by the problems of having a cron-job decide which interval to backup, I added a major new feature: the modified ccollect can now automatically select an interval to use for backup. Cron-job controlled backup works well if all machines are up and running all the time and nothing ever goes wrong. I have, however, some machines that are occasionally turned off, or that are mobile and only sometimes connected to local net. For these machines, the use of cron-jobs to select intervals can be a disaster. There are several ways one could automatically choose an appropriate interval. The method I show below has the advantage that it works with existing ccollect configuration files. The only requirement is that interval names be chosen to sort nicely (under ls). For example, I currently use: $ ls -1 intervals a_daily b_weekly c_monthly d_quarterly e_yearly $ cat intervals/* 6 3 2 3 30 A simpler example would be: $ ls -1 intervals int1 int2 int3 $ cat intervals/* 2 3 4 The algorithm works as follows: If no backup exists for the least frequent interval (int3 in the simpler example), then use that interval. Otherwise, use the most frequent interval (int1) unless there are "$(cat intervals/int1)" int1 backups more recent than any int2 or int3 backup, in which case select int2 unless there are "$(cat intervals/int2)" int2 backups more recent than any int3 backups in which case choose int3. This algorithm works well cycling through all the backups for my always connected machines as well as for my usually connected machines, and rarely connected machines. (For a rarely connected machine, interval names like "b_weekly" lose their English meaning but it still does a reasonable job of rotating through the intervals.) In addition to being more robust, the automatic interval selection means that crontab is greatly simplified: only one line is needed. I use: 30 3 * * * ccollect.sh AUTO host1 host2 host3 | tee -a /var/log/ccollect-full.log | ccollect_analyse_logs.sh iwe Some users might prefer a calendar-driven algorithm such as: do a yearly backup the first time a machine is connected during a new year; do a monthly backup the first that a machine is connected during a month; etc. This, however, would require a change to the ccollect configuration files. So, I didn't pursue the idea any further. The code checks to see if the user specified the interval as AUTO. If so, the auto_interval function is called to select the interval: 347a417,420 > if [ ${INTERVAL} = "AUTO" ] ; then > auto_interval > _techo "Selected interval: '$INTERVAL'" > fi The code for auto_interval is as follows (note that it allows 'more recent' to be defined by either ctime or mtime as per the TSORT variable): 125a129,182 > # Select interval if AUTO > # > # For this to work nicely, you have to choose interval names that sort nicely > # such as int1, int2, int3 or a_daily, b_weekly, c_monthly, etc. > # > auto_interval() > { > if [ -d "${backup}/intervals" -a -n "$(ls "${backup}/intervals" 2>/dev/null)" ] ; then > intervals_dir="${backup}/intervals" > elif [ -d "${CDEFAULTS}/intervals" -a -n "$(ls "${CDEFAULTS}/intervals" 2>/dev/null)" ] ; then > intervals_dir="${CDEFAULTS}/intervals" > else > _exit_err "No intervals are defined. Skipping." > fi > echo intervals_dir=${intervals_dir} > > trial_interval="$(ls -1r "${intervals_dir}/" | head -n 1)" || \ > _exit_err "Failed to list contents of ${intervals_dir}/." > _techo "Considering interval ${trial_interval}" > most_recent="$(pcmd ls -${TSORT}p1 "${ddir}" | grep "^${trial_interval}.*/$" | head -n 1)" || \ > _exit_err "Failed to list contents of ${ddir}/." > _techo " Most recent ${trial_interval}: '${most_recent}'" > if [ -n "${most_recent}" ]; then > no_intervals="$(ls -1 "${intervals_dir}/" | wc -l)" > n=1 > while [ "${n}" -le "${no_intervals}" ]; do > trial_interval="$(ls -p1 "${intervals_dir}/" | tail -n+${n} | head -n 1)" > _techo "Considering interval '${trial_interval}'" > c_interval="$(cat "${intervals_dir}/${trial_interval}" 2>/dev/null)" > m=$((${n}+1)) > set -- "${ddir}" -maxdepth 1 > while [ "${m}" -le "${no_intervals}" ]; do > interval_m="$(ls -1 "${intervals_dir}/" | tail -n+${m} | head -n 1)" > most_recent="$(pcmd ls -${TSORT}p1 "${ddir}" | grep "^${interval_m}\..*/$" | head -n 1)" > _techo " Most recent ${interval_m}: '${most_recent}'" > if [ -n "${most_recent}" ] ; then > set -- "$@" -$NEWER "${ddir}/${most_recent}" > fi > m=$((${m}+1)) > done > count=$(pcmd find "$@" -iname "${trial_interval}*" | wc -l) > _techo " Found $count more recent backups of ${trial_interval} (limit: ${c_interval})" > if [ "$count" -lt "${c_interval}" ] ; then > break > fi > n=$((${n}+1)) > done > fi > export INTERVAL="${trial_interval}" > D_FILE_INTERVAL="${intervals_dir}/${INTERVAL}" > D_INTERVAL=$(cat "${D_FILE_INTERVAL}" 2>/dev/null) > } > > # While I consider the auto_interval code to be developmental, I have been using it for my nightly backups and it works for me. One last change: For auto_interval to work, it needs "ddir" to be defined first. Consequently, I had to move the following code so it gets run before auto_interval is called: 369,380c442,443 < < # < # Destination is a path < # < if [ ! -f "${c_dest}" ]; then < _exit_err "Destination ${c_dest} is not a file. Skipping." < else < ddir=$(cat "${c_dest}"); ret="$?" < if [ "${ret}" -ne 0 ]; then < _exit_err "Destination ${c_dest} is not readable. Skipping." < fi < fi 345a403,414 > # Destination is a path > # > if [ ! -f "${c_dest}" ]; then > _exit_err "Destination ${c_dest} is not a file. Skipping." > else > ddir=$(cat "${c_dest}"); ret="$?" > if [ "${ret}" -ne 0 ]; then > _exit_err "Destination ${c_dest} is not readable. Skipping." > fi > fi > > # I have some other ideas but this is all I have implemented at the moment. Files are attached. Thanks again for developing ccollect and let me know what you think. Regards, John -- John L. Lawless, Ph.D. Redwood Scientific, Inc. 1005 Terra Nova Blvd Pacifica, CA 94044-4300 1-650-738-8083