Backup and Restore on NetBSD

Overview

Putting together the bits and pieces of a backup and restore concept, while not being rocket science, always seems to be a little bit ungrateful. Most Admin Handbooks handle this topic only within few pages. After replacing my old Mac Mini's OS by NetBSD, I tried to implement an automated backup, allowing me to handle it similarly to the time machine backups I've been using before. Suggestions on how to improve are always welcome.

Some thoughts about Strategy

The first thing you probably see when reading about these topics is the advice, don't have a backup strategy but a recovery strategy. That is, make sure your backups are actually in a usable shape and be sure you know how to apply them in an emergency. Depending on how much you value your data, you might want to store the backup media in a physically remote place. At least, you should not store it on the same disks to be backed up, but on detachable media or on a remote computer. Also it should be set read-only after the backup is finished, so it cannot accidently be damaged when accessing it.

The next question is how much time and space you want to dedicate to your backups. When doing a full backup each time, recovery is easy: just apply the latest backup. On the flip side, each backup might take a long time and much storage space. So the other extreme might be to only start with one full backup, afterwards always backing up only the increments to the previous backup. Then, of course, the restore is expensive as you need to apply each single backup from first to last in right order to do a full restore. Tools like rsync(1) mitigate by merging each increment into the previous backup, managing a copy of the backed up file system. But this collides with the requirement of not modifying previous backups.

As a compromise, the manpage of the dump(8) tool suggests to do the next increment only every nth time (for example every second time)—that is, to generate the diff to the same preceeding backup for the following two consecutive backups. Besides that, it suggests generating weekly backups incrementing on the original full backup. Finally, it suggests to build a new full backup every four weeks, this way maintaining a three-level strategy of stacked increments. Then, in the worst case, like restoring the backup of a cycle's last day, you need to apply the initial full backup, the last weekly increment and the daily increments of the third, fifth and seventh day, so you need to apply at most five backups to do a full restore.

dump(8) allows to define this using backup levels. Level 0 always is a full backup. Each higher level generates the diff to the last lower level's backup contents. So applying the dump levels 0 3 2 5 4 7 6 for the first and 1 3 2 5 4 7 6 for the three next weeks follows the backup plan sketched out above. Of course, you may always fine-tune this to your needs.

Another plan would be to only backup personal data like your user directories. Then the restore plan would include a fresh OS setup, installing of all software needed and then fetching only the user directories from backup. This doesn't guarantee you get to the same state as before, as you probably haven't tagged the exact versions of all software installed before.

While there are many third party solutions out there, my plan is to use the on-board capabilities for backup. This way, the restore tools are in reach without additional installation steps. For instance, the mini root ram disk of NetBSD's installation kernel at least contains the restore(8) tool, mentioned below, on board.

As I want to be able to go back also after experimental software updates, my plan has been to setup a full backup using dump(8) and restore(8), using the strategy suggested above. After setting up this plan and seeing my incremental backups are much smaller than the full one (and even the later weekly increments), I decided to modify the plan sketched out above and also do the first monthly backup as level 1, this way doing full backups only on demand (e.g. after a system upgrade). On the other side, when there are large diffs every day, it may be more practical to just do a weekly full backup and daily incremental backups diffing to the previous day. For example, in times when compiling larger parts of pkgsrc, this seems to make sense.

Accessing a remote backup device

When you don't have a backup tape device, you probably instead should have an external backup medium ready. In the easiest case, that device may be attached directly to your computer, so you can just adress it's device entry.

When it is attached to another computer, there are several options. The first one would be to use the remote option of dump(8), which indirectly acesses the remote computer using ssh(1) and rmt(8), so both must be installed and accessable there. Then you can set environment variable RCMD_CMD to ssh and address your device by option -f user@host:file.

If, for example, rmt(8) is not available, your next option suggested by many tutorials would be to pipe to dd(1) using ssh(1).

dump <options> | ssh -l <user> <host> dd of=/dev/<dump-device>

The pipe for the way back to restore then would be like this:

ssh -l <user> <host> dd if=/dev/<dump-device> | restore -f -

where /dev/dump-device might also be a path to a plain file. Unfortunately, doing an interactive restore via this sort of piped ssh seems to be not such a good idea, especially if the backup file is large. Nevertheless, this might be an option for doing non-interactive restores.

But if you can ssh(1) into a remote box, the easiest way to get it within reach would be to just mount it using mount_psshfs(8).

In my case, the backup device is an Apple Time Capsule, being also a NetBSD-operated device. My first plan, using the remote backup facility of dump(8), didn't work out because of the missing rmt(8) command on the time capsule. Perhaps some day I'll try to find a statically linked rmt(8) binary for NetBSD-6.0/evbarm (or cross-build it myself). For now, I'm resolving to using mount_afp, provided by pkgsrc, and mounting the time capsule filesystem to access it in a less sophisticated way.

BTW, when doing so, I had to manually create a link to /dev/fuse0 (ln -s /dev/putter /dev/fuse0) to make afpfsd work. Until now, the automatic mounting of the afp device doesn't seem to work reliably, which kind of counteracts my approach a little bit. I had at least one case where the afpfsd crashed while dumping.

The second (and more severe) problem with this approach is not being able to restore from scratch in case of a complete failure. As mentioned, I'd like to be able to restore from the NetBSD installation mini root filesystem, which doesn't contain mount_afp. Network tools available there include rcmd (allowing simple, unsecured remote access via restore, rexec and rmt), ftp or mount_nfs. For all of them, the server-side components are missing on the time capsule. So, in case of a complete restore, my choice will probably be to mount_afp the backup device onto another system, re-export it from there via nfs and this way, finally make it reachable for the NetBSD installation mini root.

Snapshots

One downside of using dump(8) is that it cannot reliably take backups from live file systems. That used to imply the need to go down to single user and umount the files systems for each backup. Fortunately, NetBSD has a nice support for file system snapshots courtesy of fssconfig(8), easing the backup process very much.

As root, for example, use

fssconfig -cv fss0 / /root/snapshot

to snapshot the file system and make the snapshot reachable through the /dev/fss0 device. The file /root/snapshot is used internally to manage the snapshot while the filesystems stays live. You can then mount the device and see the unchanged directory, even if you change the live filesystem.

fssconfig -l shows the snapshot devices currently in use. With fssconfig -u you can remove a snapshot after dumping it. Afterwards, the snapshot file can also be removed.

dump(8) logs the time, device and level of each dump into /etc/dumpdates. Normally, the file system devices are used here. But when using fss snapshots, as the fss device name is written into dumpdates instead, you should always consistently use the same different fss device numbers when dumping different file systems. For example use fss0 for root, fss1 for /usr when they are on different mount points, etc.

As I don't want the directory entry for the snapshot to be included into the dump, I put it into /tmp, which resides on a tmpfs in my system, so it is guaranteed to not be included into the file system dumped. When doing this, an image is generated used as backing store while the snapshot persists. As this may be too large for the /tmp file system, you can specify a block size and backing store size in the fssconfig(8) call. This way, I'm giving a smaller size and then mount the fss device read-only so that the backing store doesn't overflow.

Restoring

All this work is done to be able to walk the opposite way and restore a damaged system in case of an emergency. So lets now have a look on restore(8). It can do full or partial restores and also has an interactive mode.

restore -t -f dump_file

This doesn't modify anything, but just outputs the contents of the backup. This is not only the file and directory names, but also the dump date, level and in case of an incremental backup, the previous level.

When doing a full restore into a fresh file system, prepare it using newfs(8) before. Afterwards, mount(8) and cd into the new file system, as the restored files go into the current directory.

restore -rf dump_file

This rebuilds the file system. When a set of incremental dumps is to be applied, restore(8) needs to pass information between the different runs. So it creates a restoresymtable file in the root directory storing infos about it's progress. Consequently, this file should be left until the complete restore is finished.

restore -if dump_file

This allows you to interactively look into a dump and select single files or directories to be restored. ? shows the commands available here. Often, when you just want to get back some older versions of a file, this is the most useful tool. However, when implementing partial incremental backups as shown above, you only have backed up versions of the last seven days and of the initial dump. So if you need more, respect that when defining your strategy.

restore -xf dump_file

This extracts single files or directories instead of doing a full restore, so it also creates no restoresymtable.

And finally,

restore -ruf dump_file

does a full restore, but can be used on a populated file system. It unlinks and therefore replaces files by the versions from the backup. So it can be used to try and repair a file system. By the way, when applying an incremental backup after a full restore, the files to be replaced by the increment are automatically unlinked before, so this also works as expected without any need to specify the -u argument.

When restoring a backup done with the strategy sketched out above, start with the (latest) level 0 dump, then work through all newer dumps leaving out each one where a newer dump with lower level exists. The dates and other infos about each dump file can be extracted from output of restore -t, or interactively by using the what command in restore -i. For example, when dumps were generated with order 0 3 2 4, you'll find that for level 3 dump a newer one with lower level exists (number 2), so 3 is left out. The only one with lower level than 2 is the older 0, so you choose 2. 4 has also only lower ones with older dates, so 4 is also choosen, giving the restore order 0 2 4.

Some more notes

You can exclude files or directories from the backup by setting the nodump flag. ls -o shows the current flags. Set nonodump to remove a flag.

chflags nodump file-or-dir
ls -o
chflags nonodump file-or-dir

By default, the nodump flags are honored for incremental backups starting with level 1, but you can change this with the dump -h option. I'm setting this to 0 to always have the flags honored.

dump 0a -h 0 -f /tmp/backup.1 /home

For example, I'm using this to exclude /usr/pkgsrc from the backup.

Otherwise, you can also specify a list of paths, when only a subset of a file system should be backed up. When doing this, dump(8) is always doing a full level 0 backup of the given directories.

When a long dump is running, you can send a SIGSTATUS to the dump process to make it report it's progress. For example, when the status control character is mapped to CTRL-T via stty(1), a dump process running in the foreground reports the progress when pressing that (restore also).

If you are manually doing backups, besides looking at /etc/dumpdates you can use dump -w to show the file systems currently to be dumped. Otherwise, you can always use dump -W to show the last dump times and levels of all dumped file systems. dump(8) is also integrated into the housekeeping concepts of NetBSD insofar, as this output is included into the daily(5) maintainance tool.

The dump frequency in days, used to determine which file systems need to be dumped next, can be defined in /etc/fstab's fifth entry. But when using snapshots, the devices actually dumped are not listed in fstab, so this mechanism isn't working. Automating the backup in a crontab and defining dump entries with adjacent frequencies can mitigate this.

An example session

Here is an example of mounting/unmounting an afp backup device, handling a file system snapshot and doing a full dump.

mount_afp afp://:passwd@host/path /mnt/backup
fssconfig -cv fss0 / /snapshot
mount /dev/fss0 /mnt/dev
dump -0ua -h 0 -f /mnt/backup/dumpfile.0 /mnt/dev
umount /mnt/dev
fssconfig -u fss0
rm /snapshot 
afp_client unmount /mnt/backup

To do a restore, you would use the same sequence, replacing the dump command perhaps with an interactive restore:

restore -if /mnt/backup/var

Putting the pieces together

Most of this is put together into a bash script, backup.sh (see below at the end). When sourced, it provides some commands to support handling snapshots, mounting of the backup device, making an incremental backup following a configured strategy and accessing/restoring from the backup device. For example, after modifying the conf file to your needs, a manual initial level 0 dump can be done like this:

. /root/bin/backup.sh && backup - 0

An interactive restore session of the last level 5 dump is done by this:

. /root/bin/backup.sh && restoredump - -i 5

The script includes an example on how to automate daily backups by calling it via crontab, saving the output to a log and mailing it to root.

At the end…

After a few days of automatic backups, this setup seems to work quite reliably. The files are rotated and replaced in the expected order, looking at the contents with interactive restore and doing a test recovery, everything looks good. Having set up this kind of backups gives some confidence—now lets make sure continuously this actually is justified..

Other, more sophisticated means of data security include usage of zfs or raids, which one day may be topic of further explorations..

As a side note, while experimenting with dump(8) and restore(8), I stumbled upon the last dump made on my NeXTStep System some decades ago. And, believe it or not, the restore(8) command on 2020's NetBSD is still able to read that old dump format. So when I'll find some more time, I hope to restore it into a virtualized NeXTStep reincarnation. That would be a recovery strategy having been worked out really well!

Feel free to leave a comment on Reddit

Appendix: the backup script

Take caution as this is not yet well enough tested—just use it as a simple example. For example, make sure that two dumps of different file systems don't run at the same time. Otherwise, the first one finishing will unmount the backup device, making the second one fail.

#!/usr/pkg/bin/bash

# copy and adapt the config vars into ~/etc/backup.conf
# put auth info like DUMPDEVPWD into ~/.backup.conf
# and set it chmod 400 and chflags nodump
# install as root crontab like this:
# # daily backups
# 0       1       *       *       *       /usr/pkg/bin/bash -c '. /root/bin/backup.sh && backup /root/etc/backup-var.conf' 2>&1 | tee /var/log/backup-var.out | sendmail -t
# 30      1       *       *       *       /usr/pkg/bin/bash -c '. /root/bin/backup.sh && backup' 2>&1 | tee /var/log/backup.out | sendmail -t

# uncomment to test backup config
#TEST=echo

# define this when dump device must be mounted
DUMPDEV=afp://:${DUMPDEVPWD}@timecapsule/ 
DUMPMNT=/mnt/bkup

# unset if you dont want a snapshot
FSS=fss0
SRCDEV=/
SNAPSHOT=/tmp/snapshot

# mountpoint of fs to backup
SRCMNT=/mnt/dev

# backup device or file
BACKUP=${DUMPMNT}/client/dump
#BACKUP=

# backup levels for each day of month
LEVELS=(- 0 3 2 5 4 7 6 1 3 2 5 4 7 6 1 3 2 5 4 7 6 1 3 2 5 4 7 6 1 3 2)

dumpcmd() {
  ${TEST} dump $*
}

restorecmd() {
  ${TEST} restore $*
}

export PATH=$PATH:/usr/pkg/bin

# absolute paths to use in root crontab
test -f /root/.backup.conf && . /root/.backup.conf
test -f /root/etc/backup.conf && . /root/etc/backup.conf

backup_dev() {
  if [ "${DUMPMNT}-" != "-" ]; then
    case "$1" in
    mount) ${TEST} mount_afp ${DUMPDEV} ${DUMPMNT}
           ;;
    unmount) ${TEST} afp_client unmount ${DUMPMNT}
             ;;
    esac
  fi
}

# when the time capsule needs to spin up, mount seems to fail
# so a second try is done
mount_backup() {
  backup_dev mount
  if [ $? -eq 2 ]; then
    backup_dev mount
  fi
}

snapshot() {
  if [ "${FSS}-" != "-" ]; then
    case $1 in
    new) ${TEST} fssconfig -c ${FSS} ${SRCDEV} ${SNAPSHOT} 512 10485760
         ${TEST} mount -r /dev/${FSS} ${SRCMNT}
         ;;
    rm) ${TEST} umount ${SRCMNT}
        ${TEST} fssconfig -u ${FSS}
        ${TEST} rm -f ${SNAPSHOT}
        ;;
    esac
  fi
}

find_level() {
  # find level for today
  LEV=${LEVELS[`date '+%e'`]}
  if [ "${1}-" != "-" ]; then
    LEV=${1}
  fi
  if [ "${BACKUP}-" != "-" ]; then
    BACKUPFILE=${BACKUP}.${LEV}
    BOUT=
  else
    BACKUPFILE=
    BOUT=-
  fi
}

# makedump lev
makedump() {
  find_level ${1}

  # save prev lev 0 dump as prevmonth
  if [ ${LEV} -eq 0 ]; then
    test -f ${BACKUPFILE}.prevmonth && rm ${BACKUPFILE}.prevmonth
    test "${BACKUPFILE}-" != "-" && test -f ${BACKUPFILE} && mv ${BACKUPFILE} ${BACKUPFILE}.prevmonth
  fi

  # save prev lev 1 dump as prevweek
  if [ ${LEV} -eq 1 ]; then
    test -f ${BACKUPFILE}.prevweek && rm ${BACKUPFILE}.prevweek
    test "${BACKUPFILE}-" != "-" && test -f ${BACKUPFILE} && mv ${BACKUPFILE} ${BACKUPFILE}.prevweek
  fi

  # all other levs: rm instead of overwriting
  if [ ${LEV} -gt 1 ]; then
    test "${BACKUPFILE}-" != "-" && test -f ${BACKUPFILE} && rm ${BACKUPFILE}
  fi

  # and dump
  dumpcmd ${LEV}ua -h 0 -f ${BACKUPFILE}${BOUT} ${SRCMNT} 
}

mailheader() {
  echo "To: root"
  printf "Subject: %s backup dump output for %s\n\n" `hostname` "`date`"
}

# backup [conf [lev]]
backup() {
  mailheader
  # check for custom conf
  test $# -gt 0 && test -f ${1} && . ${1}
  # check for lev
  test $# -gt 1 && LEV=${2} || LEV=
  find_level $LEV

  snapshot new
  # backup_dev mount
  mount_backup
  makedump $LEV
  backup_dev unmount
  snapshot rm
}

# restoredump [conf [args [lev]]]
restoredump() {
  # check for custom conf
  test $# -gt 0 && test -f ${1} && . ${1}

  # check for args
  test $# -gt 1 && ARGS=${2}

  # check for lev
  test $# -gt 2 && LEV=${3} || LEV=
  find_level $LEV

  # backup_dev mount
  mount_backup
  restorecmd ${ARGS} -f ${BACKUPFILE}${BOUT}
  backup_dev unmount
}

..and an example of a backup.conf showing how to dump/restore using ssh pipes:

# comment to really do backups
TEST=echo

# define this when dump device must be mounted
DUMPDEV=afp://:${DUMPDEVPWD}@timecapsule/
DUMPMNT=/mnt/bkup

# unset if you dont want a snapshot
FSS=fss0
SRCDEV=/
SNAPSHOT=/tmp/snapshot

# mountpoint of fs to backup
SRCMNT=/mnt/dev

# backup device or file
BACKUP=${DUMPMNT}/client/dump
#BACKUP=

# backup levels for each day of month
LEVELS=(- 0 3 2 5 4 7 6 1 3 2 5 4 7 6 1 3 2 5 4 7 6 1 3 2 5 4 7 6 1 3 2)

# uncomment this to define custom commands
#dumpcmd() {
#  dump $* | ssh timecapsule dd of=/Volumes/dk2/ShareRoot/client/dump.${LEV}
#}

#restorecmd() {
#  ssh timecapsule dd if=/Volumes/dk2/ShareRoot/client/dump.${LEV} | restore $*
#}