Backup
From Gentoo Linux Wiki
| Installation • Kernel & Hardware • Networks • Portage • Software • System • X Server • Gaming • Non-x86 • Emulators • Misc |
Contents |
[edit] Subpages
[edit] Backup Types
There are three basic types of backups: full, differential and incremental. A complete/whole/full backup makes a copy of all the files of the set to be backed up (such as your home directory, or your web site, or a complete filesystem). “Incremental” and “Differential” backups are used to minimize the amount of time for a backup, repetitive backups, and space on a disk.
[edit] What is an incremental Backup?
An incremental backup is a partial backup in that only the files which have changed since the last backup (full, or incremental) will be copied. Consequently, a series of incremental backups needs to be preceded by a full backup. All backups are kept for a possible need to restore the data (restore: the reverse of a backup). Restoring to a certain point in time requires locating the last valid full backup and all the incremental ones that followed, up to the particular point in time where the system is supposed to be restored. This model offers a high level of security that something can be restored and can be used with removable media such as tapes and optical disks. The inconvenience is dealing with a long series of incrementals and the high storage requirements.
[edit] What is a differential Backup?
A differential backup is also a partial backup, but differs from an incremental one in that the files to be copied are the ones which have changed since the last full backup; this means that the list of files increases with time, and each differential backup is larger than the previous. Its advantage is that a restore involves recovering only the last full backup and then overlaying it with the last differential backup. The disadvantage is that each day elapsed since the last full backup, more data needs to be backed up, especially if a majority of the data has been changed.
- Source: Wikipedia
For most home users, not professional, making differential backups is the best alternative. Starting with a full/complete backup and followed by a full/complete weekly, monthly or whatever is required. The best time for a complete backup is reached when the differential backup becomes too large. The disadvantage is the file for differential backup will be always overwritten. The advantage is that it saves time and space.
[edit] Important Considerations
When you're making your backup, you don't want the filesystem changing while you're copying it (this can lead to problems). You should only run a backup on a partition that is either unmounted or mounted read-only. This means you may have to either:
- Reboot with a live-cd, and running dd on your unmounted partitions
- If you use LVM2 you can use the snapshot feature
- Go into single user mode and remount the partition in read-only mode
A short example:
# Go into single user-mode init 1 # Remount /home read-only mount -o remount,ro /home
[edit] Large backup solutions
Backup Server software designed to back up multiple computers over a network.
[edit] Lightweight solutions
Software basically designed for single computers.
[edit] Local backups
Software can be used without a network.
[edit] Keep
Keep is a graphical frontend to rdiff-backup (see below) designed for KDE.
[edit] Flexbackup
Flexbackup is a perl tool to backup important directories. Like the name implies, flexbackup is very flexible.
# emerge -av flexbackup
This is what the important part of my /etc/flexbackup.conf file looks like:
| File: /etc/flexbackup.conf |
$set{'research'} = "/home/david/research /var/cvsroot/python /var/cvsroot/latex";
$set{'mail'} = "/home/david/.thunderbird";
$set{'etc'} = "/etc /home/david /var/www/davidgrant.ca/htdocs";
$prune{'/home/david'} = ".jpi_cache konserve-backup .cxoffice .wine .mozilla .kde3.1 .thunderbird";
$prune{'/home/david/.thunderbird'} = "Junk News";
$compress = 'gzip'; # one of false/gzip/bzip2/lzop/zip/compress/hardware
$compr_level = '6';
$device = '/mnt/sata/backup';
|
The "research", "mail", and "etc" names are just convenient names which you use at the command line when telling flexbackup what to backup. Each directory within "research" will get backed up to its own tarball, however, as "home-david-research.0.tar.gz", "var-cvsroot-python.0.tar.gz", and "var-cvsroot-latex.0.tar.gz".
$prune is a useful feature which allows you to mask out certain directories which you don't want to be included in the tarballs.
I set $compress to gzip but you can use bzip2 if you want smaller tarballs. bzip2 is a bit slower to pack and unpack, however. $compr_level=6 also makes things a bit quicker, yet still makes the tarballs quite compact.
I use the following crontab to create my backups:
0 3 1-7 * * flexbackup -set all -full -w 7 0 3 * * 6 flexbackup -set all -differential 0 3 * * 1-5 flexbackup -set all -incremental
This will perform a full backup on the first Sunday of every month, a differential backup every Saturday, and an incremental backup every day of the week. All backups are performed at 3am.
if flexbackup -set $MYSET -level full ; then
for i in `/usr/bin/find /path/to/backups -name \*.gz -mtime +0 -exec basename '{}' \;`
do
flexbackup -rmfile $i;
done
fi
In the preceding example, all backup files in the /path/to/backups directory (as indicated by \*.gz, assuming you tell flexbackup to gzip the backup archive), older than 1 day (specified by the -mtime +0 flag), are deleted, but only if the full backup of the set is successful. [edit] Using ZFS and rsync
You can create ZFS partition on your hard disk, a ZFS loop device, or put ZFS on a removable device (e.g. SD card / USB device). Then you can rsync your data to the ZFS partition and take a snapshot of it periodically. Read a complete description on HOWTO Backup using ZFS.
[edit] Partitions
[edit] dd
to copy the master boot record, then the trick is to make it a file:
dd if=/dev/hda of=/root/hda.mbr bs=512 count=1
That is, of course, assuming that /dev/hda is the boot drive (replace with whatever drive you are booting). Now that it is a file, rsync will copy it as well. You can put it back with:
dd if=/root/hda.mbr of=/dev/hda bs=512 count=1
This article is still a Stub. You can help Gentoo-Wiki by expanding it.
[edit] Remote backups
[edit] Filesystem
[edit] rdiff-backup
rdiff-backup is a simple to use, yet powerful, backup utility. It can be used to build a mirror of a local source directory on a remote machine. The program automatically keeps an archive of differences with respect to previous mirrored versions so that old files or old version of still existent files can be restored. All the underlying network traffic is handled by ssh. See the rdiff-backup Main page for a detailed description.
rdiff-backup is present on the Gentoo Portage tree, to install it simply do:
# emerge rdiff-backup
In order to be able to perform a backup of local files on a remote machine, rdiff-backup must be installed both locally and on the remote machine, and must be the same (major/minor?) version (e.g., 0.13.x is incompatible with 1.0.x). In principle, no root privilege is necessary since the program is run with user privilege both locally and remotely.
Let us assume that you properly installed rdiff-backup on the remote machine remotehost.remotedomain. Then the use of this utility can be as simple as
rdiff-backup ~/mydir remoteuser@remotehost.remotedomain::mydir-backup
for this to work, diff-backup must be installed both locally and on remotehost.remotedomain.
In this way, a new subdirectory named mydir-backup is created in the user's home directory remoteuser at remotehost.remotedomain. If this directory was already present on the remote machine, it is updated to reflect the present content of mydir, but the differences with respect to the previous version are also stored. Examples on how to manage the remote mirror and, in particular, how to recover past version of the whole directory or part of it can be found here.
Depending on the nature of the ssh user authentication scheme in use on the remote machine, the previous command can stop and ask the user for a password or a passphrase. The interactive behavior of the program can of course be avoided if an authentication agent (ssh-agent) is properly configured on the local machine. However, if one wants to implement the backup activity in an automatic, unattended way, for instance as a cron job, it is quite possible that such an agent is not available to the program when it is launched.
Let us see how this problem can be easily solved.
Unattended backup with rdiff-backup
First of all you need a passphrase-less key pair to be used specifically with rdiff-backup. You can create it with:
ssh-keygen -t dsa
When asked, choose .ssh/backup_dsa as the file where to save the new key and press enter when asked for a passphrase.
Now copy the newly generated public key to the remote host:
scp .ssh/backup_dsa.pub remoteuser@remotehost.remotedomain:
Then log in remotely and add the new key to the list of authorized keys.
ssh remoteuser@remotehost cat backup_dsa.pub >> .ssh/authorized_keys rm backup_dsa.pub
The last line is not necessary, but it is always safer to remove the unnecessary files. Now you have to tell the remote machine that the new key can only be used for specific tasks. For this purpose, edit the file .ssh/authorized_keys with you preferred editor and add a command specification in front of the newly added key (it will be the last line) to obtain something like:
| File: .ssh/authorized_keys on remote machine |
command="/usr/bin/rdiff-backup --server",no-pty,no-port-forwarding ssh-dsa <...public key characters follow...> |
Now it's time to leave the remote host and configure the local machine. You have to be sure than when rdiff-backup is started, it makes use of the just generated, passphrase-less, key pair. You can do that using the powerful ssh aliasing feature. Edit .ssh/config and add:
| File: .ssh/config on local machine |
Host remote-backup Hostname remotehost.remodomain IdentityFile ~/.ssh/backup_dsa IdentitiesOnly yes |
In this way, when you use remote-backup as the host identifier in a ssh or scp command, the host actually contacted will be remotehost.remodomain but the key stored in /.ssh/backup_dsa will be used, instead of the default one (whatever, if any, it would be).
The previous configuration allows for a straightforward use of rdiff-backup as a cron job. If you want an automatic backup, one minute after 1 a.m., of ~/mydir add the following line to your cron table:
| File: cron table on local machine |
01 1 * * * rdiff-backup /home/localuser/mydir remoteuser@remote-backup::mydir-backup |
Notice that the alias remote-backup is used instead of the true host name remotehost.remodomain.
[edit] rsnapshot
# emerge rsnapshot
then copy the example config file
# cp /etc/rsnapshot.conf.default /etc/rsnapshot.conf
Config File Syntax
The config file is fairly straightforward but here are some basic rules/common problems that people run into.
- Lines that start with a # are comment lines (There for ignored)
- Directory paths need to have a trailing slash at the end.
- When defining a rsnapshot parameter in the config file you must separate each portion of the parameter with tab(s), not spaces. The program was designed this way to make dealing with file/directory paths containing spaces easier.
setting times in the config file
The interval in "rsnapshot.conf" does not call any cronjob, it only tells how many backups with that title are done.
You have do declare the cronjobs by yourself
"interval monthly 3" means that you'll keep three monthly snapshots. "interval hourly 6" means that six hourly snapshots will be retained.
rsnapshot.conf has nothing to do with the frequency of the snapshots; that's determined by cron.
"0 */6 * * * /usr/local/bin/rsnapshot hourly" means that hourly snapshots will be performed every six hours, at 0000, 0600, 1200 and 1800.
backup paths
Rsnapshot's default rsync_long_args setting (--delete --numeric-ids --relative --delete-excluded) includes rsync's --relative option, which causes the extra path information to be used in the target.
Set your own rsync_long_args that doesn't include --relative in the configuration file,
e.g.:rsync_long_args --delete --numeric-ids --delete-excluded
excluding files
To exclude eg all directories starting with a dot do
exclude =.*
not a regexp (^\.) sort of stuff. More details in the rsync man page under EXCLUDE pattern
spaces in file names
if you are backing up a windows directory there will be directories and file names with spaces. Spaces in filenames can cause problems. the current best solution is to replace the spaces with a ?
exclude /mnt/winXP/John/My?Documents/folders/holiday?pictures/
trying to backup TO a vfat windows partition generates a lot of 'cant chown' errors.
ssh backups
- the public/private key authentication details go in the SSH args line
ssh_args -p 22 -i /root/.ssh/backupserver_dsa
or add the relevant information to /etc/ssh/config_ssh
- for multiple backups, you can set ssh_args on a per-backup basis, like this:
backup root@example.com:/home/ example.com/ ssh_args=-i example.key
or add to ssh_args on a per-backup basis, like this:
backup root@example.com:/home/ example.com/ +ssh_args=-i example.key
You may want to put this into crontab.
00 00 * * * /usr/bin/rsnapshot daily
will run it at midnight
cron will automatically email you anything that rsnapshot prints to STDOUT or STDERR. So mail goes to root@server and you might not receive it. You can change where cron sends mail by setting the MAILTO variable in the crontab file like so:
MAILTO=myaddress@example.com 00 00 * * * /usr/bin/rsnapshot daily [other cron jobs go here]
tips and tricks
- to show all the new files in daily.0
You can use the --links parameter to the 'find' utility, like so: find daily.0 -type f -links 1
- to backup severals servers that run all day and also a few desktop workstations that only run at day
use different cronjobs where rsnapshot is called with different config files files (option -c).
- if you get lchown errors you need to install the Lchown perl module
- to check if a remote server is accessible before backing up
a shell script with rsync
testit=`rsync root@1.2.3.4:/my/path/ 2>&1 | grep 'failed:' | wc -l` if [ $testit = 1 ]; then exit fi
or as part of the cronjob and it will email you when it fails
ping laptop && rsnapshot -c /etc/rsnapshot-laptop.conf daily"
[edit] Bacula Network Backup System
Bacula is a set of computer programs that permit you (or the system administrator) to manage backup, recovery, and verification of computer data across a network of computers of different kinds.
In technical terms, it is a network based backup program.
Bacula is relatively easy to use and efficient, while offering many advanced storage management features that make it easy to find and recover lost or damaged files.
# emerge -av bacula
[edit] Incremental Backups Using Rsync
There is an excellent tutorial regarding incremental backups available here.
Using incremental backups, you can create multiple snapshots of your data while conserving disk space and saving processing time. It's a very fast and cheap way of creating snapshot-style backups of your data.
The script is fairly easy to use, even for a non-expert. I just had to make a few simple changes to the backup script on the web site to get it working on my system.
Programs That Use Rsync To Create Incremental Backups
If you need more functionality than Mike Rubel's script can give you, or if you're uncomfortable editing a bash script, then you may be interested in one of these programs:
A manual example
- you want to mirror your disk/backup a partition.
- we assume you have created a partition on another drive and mounted it (in this example /mnt/usbharddrivemain)
You can:
Duplicate a partition (good for a once off):
# rsync --progress --stats -avxz --exclude "/mnt/usbharddrivemain/" --exclude "/mnt/usbharddriveboot/" --exclude "/usr/portage/" --exclude "/proc/" --exclude "/root/.ccache/" --exclude "/var/log/" --exclude "/sys" --exclude "/dev" --exclude "tmp/" /* /mnt/usbharddrivemain
Duplicate a partition and delete outdated files (which have since been deleted on the primary partition):
# rsync --progress --stats --delete -avxzl --exclude "/mnt/usbharddrivemain/" --exclude "/mnt/usbharddriveboot/" --exclude "/usr/portage/" --exclude "/proc/" --exclude "/root/.ccache/" --exclude "/var/log/" --exclude "/sys" --exclude "/dev" --exclude "tmp/" /* /mnt/usbharddrivemain
For my boot partition I use:
# rsync --progress --stats -avxzl /boot /mnt/usbharddriveboot
while for another partition I use:
# rsync --progress -avxzl --stats --delete /boot /mnt/usbharddriveboot
To restore, one can either boot off the secondary disk OR one can use a Gentoo Live CD. Repeat the above commands (with respect to new mount locations), but alter the source and destination parameters, e.g. /mnt/usbharddrivemain /mnt/driveToRestoreTo not the other way around.
[edit] ssh + tar scripts
In this method the contents of the backup travel (compressed) across the network, but through ssh. You can choose to backup any part (or all) of a filesystem. This is also useful if you have run out of space on a machine that needs backing up - this method doesn't require an intermediate tar file to be stored on the harddrive.
Assume two machines - server.homelinux.com and desktop.homelinux.com. Let's say server.homelinux.com has run out of disk space, you need to back it up (or even just get some files) to desktop.homelinux.com. Easy!
Connect to server.homelinux.com (either ssh in, or use a terminal):
$ ssh server.homelinux.com
Now, from the server, tar and copy the desired directory:
$ tar -zcf - /var/backup | ssh desktop.homelinux.com "( cat > backup.gz )"
This will copy the contents of /var/backup on server.homelinux.com to ~/backup.gz on desktop.homelinux.com.
Take note of file size limits! You may run into problems at 2 gigs on some old kernels, or *nix variants. tar used to have a limit of 8 gigs or so, assuming the underlying kernel/filesystem would allow it. fat32, for instance, will only allow files up to 4Gb.
For an extremely lean backup system (~30 lines of bash scripting... but even more compressible had I not wanted to make it user-friendly) that relies on the most basic tools (Bash, tar, xargs, etc.), take a look at this script.
You maintain a collection of backup lists that are effectively arguments to tar and the script will keep n many backups of that data on some hdd. From here, you can do whatever you want with that tarball, like scp it over to other boxes... Redundancy is nice in the backup business. People in the thread have offered extensions like iso images, splitting, etc.
This was designed to be a starting component in your larger backup system. I haven't looked into incremental backups with this, but it is possible using tar with some time and bash programming skills.
These example are taken from this thread on the gentoo-user mailing list, which provides a good read!
[edit] Partitions
[edit] netcat + dd
A very simple but useful way to do a remote backup is using dd and netcat.
| Note: Be aware that this is a rather liberal use of the word "remote". This method is neither efficient nor secure and that it is only suitable if both machines are connected to the same internal, private network. |
Netcat is available in two flavors:
# emerge gnu-netcat
or
# emerge netcat
For a complete image of your /dev/hda1 partition start netcat in listening mode on the remote machine:
- nc -l -p 10000 > image.gz
On your machine run dd to read the partition, gzip to compress the content and netcat to transfer it over to the other machine:
- dd if=/dev/hda1 | gzip | nc -w 5 remote_ip 10000
See How to clone a Linux box using netcat for additional information.
Although, this document mentions both netcat and gnu-netcat, you need to make sure you use either netcat or gnu-netcat on both the machines. netcat on one and gnu-netcat on the other will hang. I had this problem with linux gnu-netcat and cygwin netcat. I resolved it using netcat on linux side as well.
one tip to reduce the size of 'dd' image is to mount /dev/hda1 read-write and zero out the unused blocks on it. This way the gzip compression works better and I could reduce a 18GB full KDE amd64 install (with 32 chroot) into a 2.7 GB image.
One way to accomplish that is:
mount /dev/hda1 /mnt/gentoo
mblocks=`df -B 1M /mnt/gentoo | grep gentoo | awk '{print $4}'`
# leave 256MB alone
mblocks=$((mblocks-256))
cd /mnt/gentoo
# create highly compressible file which spans the most of the free space in partition
time dd if=/dev/zero bs=1048576 count=$mblocks of=file-with-zeroes
# delete it
/bin/rm -f file-with-zeroes
umount /mnt/gentoo
and proceed with the above netcat method. Now dd image is no longer as big as the partition. Beauty of this method (netcat/dd) is that it can all be nicely automated.
-devsk
[edit] Backup Wrapper Scripts
Sometimes you want to backup things that aren't 'ready' to be backed up. Things like /boot on some systems where it isn't auto-mounted, or maybe a running mysql daemon. It would be nice to be able to mount /boot, or put mysql into read-only mode for the duration of the backup. That is where wrapper scripts come into play. The script used flexbackup as an example, modify as needed.
| File: /etc/scripts/flexwrapper.sh on local machine |
#!/bin/sh # mount /boot for backup mount /boot # MySQL Read Only # NOTICE: the following method is an EXAMPLE! # you might not want your passwd in plain-text in a wrapper script. mysql -u root --password='yourpass' -e "SET GLOBAL read_only=1;" # Run flexbackup, and pass shell args flexbackup $* # give the system 10 seconds to clean up sleep 10 # MySQL Read/Write mysql -u root --password='yourpass' -e "SET GLOBAL read_only=0;" # umount /boot umount /boot |
The basic idea is to do preparation works in your wrapper, run the actual backup, and then set your system back to operationg parameters.
The above mentioned crontab for flexbackup would then become:
0 3 1-7 * * /etc/scripts/flexwrapper.sh -set all -full -w 7 0 3 * * 6 /etc/scripts/flexwrapper.sh -set all -differential 0 3 * * 1-5 /etc/scripts/flexwrapper.sh -set all -incremental
[edit] Using Tapes as a Backup Media
Using tapes is still the de-facto standard in most corporate environments. Don't forget to check the status of the tapes you use (e.g. most DAT tapes can be reused up to 99 times) and DO clean your drives regularly with the appropriate cleaning cartriges (just insert the cleaning cartrige and wait till it is ejected).
The minimum set of software that is needed:
# emerge -auv app-arch/{mt-st,tar}
Put appropriate tape in the drive and check its status:
# mt -f /dev/st0 status
Save some files (e.g. the /home directory):
# tar -cvp -f /dev/st0 /home
Check if everything is OK:
# tar -tv -f /dev/st0
Eject the tape:
# mt -f /dev/st0 eject
To later restore from the tape, load it, check its status and then:
List the contents:
# tar -tv -f /dev/st0
Restore (NB: First change to the appropriate directory)
# tar -xvp -f /dev/st0
Using tapes is not the fastest process, but it helps organize your backups and you can still sip your coffee while the tape is running :-)
NB: Before a full backup or restore remember to mount partitions like /boot that are not automounted! Create the list of not backuped files:
# cd /root # echo -n "" > tar.exclude # echo "/dev/*" >> tar.exclude # echo "/proc/*" >> tar.exclude # echo "/mnt/*/*" >> tar.exclude # echo "/sys/*" >> tar.exclude # echo "/var/tmp/ccache/*" >> tar.exclude
To make a full backup use:
# tar -cvp -f /dev/st0 --wildcards --exclude-from /root/tar.exclude /
To restore later:
# tar -xvp -f /dev/st0 -C /
TIP: To use the tape drive on HP DL380 servers see /usr/src/linux/Documentation/cciss.txt
# echo "engage scsi" > /proc/driver/cciss/cciss0
