« The First Entry | Main | Perl One Liners = Perl Is AWESOME!!! »

Backing Up Xen Guests via LVM

Okay, since I'm severely delinquent on my second, yes, that's right, second entry, I figured I would go ahead and put something useful out here.  I intended the second entry to be a comparison of Linux and Windows, but I'm still working on that one.

Anyway, what follows is a script, called guest_lvm_mounter.sh, that I use to regularly perform an online backup of a Linux Xen guest using the Logical Volume Manager (LVM) and third-party backup software.  The backup software calls the script with specific options for when a scheduled backup job starts and when it stops.  I have tried to provide enough comments in the script below to allow you to follow it pretty easily. 

Keep in mind, if your Linux system is running any sort of database, regardless of its claims of maintaining crash consistency, or any kind of application for which moving data from memory to disk is a critical but latency-riddled process, then you should find some other method to back up your database or application data in addition to this process.  The problem with not backing up such data otherwise is this script uses LVM snapshots to enable backups, and LVM snapshots present your file system for backup much as it would be if you just yanked out the power cord (i.e. "crash consistent").

When calling the script, I assume several things.  First, I assume a single Logical Volume (LV) on the host for each guest provides the single, entire physical disk of each guest I want to backup.  Second, I assume the guest uses a single LV as the guest's block device underlying a monolithic / partition with no other partitions except for /boot.  Third, I assume the / and /boot partitions are both formatted with a file system easily recognized by my system's "mount" command because I pass no special switches to it.  In my case, this is always the ext3 file system.  Keep in mind, though the script may work for more complex setups, I have not tested it such.

Beyond my assumptions, I made sure to name the Volume Groups (VGs) and LVs within the guest something different than the VGs used on the host.  Otherwise, the kernel or LVM, I can't remember which, complains about conflicting VGs or LVs and won't map them.  I got around this by installing the host with default VG and LV names, and I specified different names for my guests.  After having done so, it occurred to me that it would have probably been better to specify different names for both on the host instead, as changing such things for each guest complicates automated deployments.  However, if the guests each had the default VG and LV names, the names would have conflicted at script execution.

The script sources some configuration files to set some needed environment variables.  It then uses those environment variables to snapshot the intended volumes, load the guests' partitions in the host's kernel partition table, make the host aware of the guest LVs, then mount the volumes inside the snapshots.  It includes three functions, one for the initial mount, one for the final umount, and one that respawns the script in a "nohup'd" process.  I'll explain the nohup thing at the end.

#!/bin/bash
# Author: Nathan Lannine
# Last Update: 2009-09-25
# Usage: ./guest_lvm_mount.sh {mount|umount|fork {mount|umount}}
# This is a specific-purpose shell script to allow mounting of virtualized
# guest logical volumes for backup from within a snapshot logical volume.
# You will likely need to modify this script to use in any other environment
# than mine.

# Set the path and name of the program that sends logs to syslog, and
# set the name of the process that sends the logs to the logging program.
LOGGER=/bin/logger
LOG_PROG=guest_lvm_mounter

# Initialize the return value to success (0).
RETVAL=0

# Check if this is a debug run.  Typically, debugging is off.  To turn
# it on, run this script by prepending DEBUG=0 to your command line.
if [ -z $DEBUG ]; then
        DEBUG=0
fi

# Check if this script is set to reboot the box if the script exits uncleanly.
# An unclean exit would indicate the snapshot volumes could not be unmounted
# and/or removed, which means no fresh backups can be made until they are
# cleaned up.  Be very careful with this option.  If you inadvertently initiate
# the reboot sequence, although it is configurable, this script is configured
# initially to wait 10 minutes to reboot.  On RedHat-like Linuxes, that means you have
# 5 minutes to login and abort the shutdown.  After 5 minutes, no new logins
# are allowed.  That blackout window may be different on other systems.  This
# option is set on the command line similarly to DEBUG above.
if [ -z $SHUTDOWN ]; then
        SHUTDOWN=0
fi

# Check if this script is set to restart the backup services.  This should
# not be set by the backup software, or unpredictable things might happen.
# This option is set on the command line similarly to DEBUG above.
if [ -z $SVC_SHUTDOWN ]; then
        SVC_SHUTDOWN=0
fi

# Check if debugging is set to 0.  If so, all messages will go to syslog.
# Otherwise, all messages will go to the terminal.  The assumption is that
# one should only use debugging when running this command interactively.
# This behavior could easily be extended such that output could be "tee"d
# to the terminal and to syslog so debugging could be set during normal,
# non-interactive use.
if [ $DEBUG -eq 0 ]; then
        [[ -t 1 ]] && echo "Writing to syslog."
        LOG_FACILITY="user.notice"
        exec > >("$LOGGER" -p "$LOG_FACILITY" -i -t "$LOG_PROG")
        exec 2> >("$LOGGER" -p "$LOG_FACILITY" -i -t "$LOG_PROG")
        exec < /dev/null 2<&1
fi

# Set config directory.  The host.conf file contains variables that should
# be set to usable values for your environment.  The guests.conf file
# contains the names of the guests you want to backup.  Some basic and
# significant assumptions are made based on the guests names, but I don't
# feel like documenting those assumptions here.  Such assumptions are better
# documented in a README.  I guess I have to write one of those now.
CONFIG_BASE=/usr/local/etc/xen_guests_backup
HOST_CONFIG=$CONFIG_BASE/host.conf
GUESTS_CONFIG=$CONFIG_BASE/guests.conf

# Check for existence of the host configuration.  If it exists, source it for
# variables.  If it doesn't exist, then exit, because I've got no other
# defaults.
if [ -f $HOST_CONFIG ]; then
        . $HOST_CONFIG
else
        echo "Host config file not found!"
        RETVAL=78
        exit $RETVAL
fi

# Check for existence of the guests' configuration.  If it exists, feed each
# line into the GUESTS array and set NUM_GUESTS to the number of items in the
# array.  If it doesn't exist, then exit, because this script has no reason to
# run.
if [ -f $GUESTS_CONFIG ]; then
        GUESTS=( `$CAT $GUESTS_CONFIG | $TR '\n' ' '` )
        NUM_GUESTS=${#GUESTS[@]}
        RETVAL=$?
else
        echo "Guests config file not found!"
        RETVAL=78
        exit $RETVAL
fi

# Check for existence of configured guests.  If there are none, then this script
# has no reason to run.
if [ $NUM_GUESTS -eq 0 ]; then
        echo "You have no guests configured!"
        RETVAL=78
        exit $RETVAL
fi

# Spout off how we were called.  Maybe this should be moved to debugging.
echo "Called as $0 $1 $2"

# The mount function.
mount() {
        # Loop through the main mount functionality for each guest in the guests array.
        for ((i=0;i<$NUM_GUESTS;i++)); do

                # Output a header line of ">".
                for ((j=0;j<$($TPUT);j++)); do
                        echo -n ">"
                        sleep .0001
                done

                # Check for the existence of a file of variables for the current guest, and
                # pull in those variables.
                if [ -f $CONFIG_BASE/${GUESTS[${i}]}.source ]; then
                        . $CONFIG_BASE/${GUESTS[${i}]}.source

                        # Go to dump_vars(), where I check for the value of DEBUG to determine whether
                        # I need to dump the current variables.
                        dump_vars

                        # Create some output for logging.
                        echo -e "\n"
                        echo $"Creating snapshot logical volume \"$SNAP_NAME\" to mount to the host for backup."

                        # Create the snapshot.
                        $LVCREATE -L ${SNAP_SIZE}G -s -n $SNAP_NAME $IMAGE

                        # Add the partitions in the snapshot to the kernel's partition table.
                        $KPARTX -av $HOST_VG/$SNAP_NAME

                        # Scan for newly available volume groups.
                        $VGSCAN

                        # Activate the guest volume group that is newly available.
                        $VGCHANGE -ay $GUEST_VG

                        # Mount the configured volume from the guest group to the configured host mount
                        # point.
                        $MOUNT $GUEST_LVOLUME $MOUNT_DIR/$G_HOSTNAME

                        # Mount the guest's non-LVM boot partition from inside the snapshot.  This requires
                        # a loop device with an offset into the snapshot because the beginning of the snapshot
                        # contains the mbr and fat, which are of predetermined size and should never change.
                        # Depending on the guest config, this should be an ext3 partition.
                        $MOUNT $HOST_VG/$SNAP_NAME $MOUNT_DIR/$G_HOSTNAME/boot -o loop,offset=$[512*63]

                        # Snag the return value.
                        RETVAL=$?
                else

                        # If no .source file exists for the current guest, then log it and/or tell the user.
                        echo -e "\n"
                        echo -e "${GUESTS[${i}]} is not configured for backup.\n"
                fi
        done

        # Return from the mount subroutine with the return value.
        return $RETVAL;
}

# The umount function.
umount() {
        # If SVC_SHUTDOWN is set to 1, then wait a bit (default config of 300 seconds), stop the backup services,
        # and wait a bit again before continuing.  This is to allow expiration of hooks into the previously
        # mounted guest files systems.  If something is still using them, then the umount function will fail.
        if [ $SVC_SHUTDOWN = 1 ]; then
                $SLEEP_CMD
                $BKUP_CMD stop
                $SLEEP_CMD
        fi

        # Loop through the main mount functionality for each guest in the guests array.
        for ((i=0;i<$NUM_GUESTS;i++)); do

                # Output a header line of ">".
                for ((j=0;j<$($TPUT);j++)); do
                        echo -n ">"
                        sleep .0001
                done

                # Check for the existence of a file of variables for the current guest, and
                # pull in those variables.
                if [ -f $CONFIG_BASE/${GUESTS[${i}]}.source ]; then
                        . $CONFIG_BASE/${GUESTS[${i}]}.source

                        # Go to dump_vars(), where I check for the value of DEBUG to determine whether
                        # I need to dump the current variables.
                        dump_vars

                        # Create some output for logging.
                        echo -e "\n"
                        echo $"Unmounting the snapshot logical volume \"$SNAP_NAME\" for removal after backup."

                        # Umount the guest's non-LVM boot partition from inside the snapshot.  This requires
                        # the "-d" flag because it was previously mounted as a loop device.
                        $UMOUNT -d $MOUNT_DIR/$G_HOSTNAME/boot

                        # Umount the configured volume of the guest group from the configured host mount
                        # point.
                        $UMOUNT $MOUNT_DIR/$G_HOSTNAME

                        # Deactivate the configured guest VG.
                        $VGCHANGE -an $GUEST_VG

                        # Remove the partitions in the snapshot from the kernel's partition table.
                        $KPARTX -dv $HOST_VG/$SNAP_NAME

                        # Remove the snapshot.
                        $LVREMOVE -f $HOST_VG/$SNAP_NAME

                        # Snag the return value.
                        RETVAL=$?
                else

                        # If no .source file exists for the current guest, then log it and/or tell the user.
                        echo -e "\n"
                        echo -e "${GUESTS[${i}]} is not configured for backup.\n"
                fi
        done

        # If SVC_SHUTDOWN is set to 1, then restart the backup services.
        if [ $SVC_SHUTDOWN = 1 ]; then
                $BKUP_CMD start
        fi

        # Return from the mount subroutine with the return value.
        return $RETVAL;
}

# The fork function starts this script detached and in the background, ready to restart the
# backup services and restart the server if needed to deal with potential failures in the
# umount function's clean up.
fork() {
        SVC_SHUTDOWN=1 SHUTDOWN=0 $NOHUP $0 $1 &
}

# This should be pretty obvious.  If DEBUG equals 1, then dump some variables.
dump_vars() {
        if [ $DEBUG -eq 1 ]; then
                echo "Now dumping variables"
                sleep 2
                echo -e "\n"
                echo "-----------------------------------------"
                echo "MOUNT_DIR = $MOUNT_DIR"
                echo "G_HOSTNAME        = $G_HOSTNAME"
                echo "GUEST_VG  = $GUEST_VG"
                echo "HOST_VG           = $HOST_VG"
                echo "SNAP_NAME = $SNAP_NAME"
                echo "GUESTS            = ${GUESTS[@]}"
                echo "NUM_GUESTS        = $NUM_GUESTS"
                echo "-----------------------------------------"
                echo -e "\n"
        fi
}

# See how we were called and head to the correct function or tell people how to use this script.
case "$1" in
        mount)
                mount
                ;;
        umount)
                umount
                ;;
        fork)
                fork $2
                ;;
        *)
                echo $"Usage: $0 {mount|umount|fork {mount|umount}}"
                RETVAL=64
esac

# If the return value from the mount or umount functions is not 0, then
# they did not exit cleanly, so log it.  Additionally, if SHUTDOWN is set
# to 1, then initiate the configured shutdown command, which is configured
# by default to reboot the server after waiting 10 minutes.  See the
# explanation of the SHUTDOWN variable at the top of this file.
if [ $RETVAL -ne 0 ]; then
        echo "$0 $1 exited uncleanly"
        if [ $SHUTDOWN -eq 1 ]; then

                # Log the shutdown sequence for monitoring.
                echo ">>>>>>>>>>>>>>>>> INITIATING SHUTDOWN SEQUENCE... <<<<<<<<<<<<<<<<<<<<"

                # Detach and background the shutdown sequence to allow this invocation of the script to exit.
                $NOHUP $SHUTDOWN_CMD &
        fi
fi

# Log the exit with the return value.
echo "Exiting: $RETVAL"

# Exit and return the return value.
exit $RETVAL

The script refers to /usr/local/etc/xen_guests_backup/host.conf.  Below, I have provided an obfuscated rendition of the host.conf file I use.  The host.conf configuration file tells guest_lvm_mounter.sh where to find system utilities that it uses and what commands provide some of its functionality.

#!/bin/bash
#
# Configuration directives for the host where you will backup
# the guests.


#
# Define the locations of executables required for operation.
#
LVCREATE=/usr/sbin/lvcreate
LVREMOVE=/usr/sbin/lvremove
KPARTX=/sbin/kpartx
VGSCAN=/sbin/vgscan
VGCHANGE=/sbin/vgchange
MOUNT=/bin/mount
UMOUNT=/bin/umount
TR=/usr/bin/tr
TPUT="/usr/bin/tput cols"
CAT=/bin/cat
BKUP_CMD="/sbin/service your_backup_service"
NOHUP=/usr/bin/nohup
SHUTDOWN_CMD="/sbin/shutdown -r +10"
SLEEP_CMD="/bin/sleep 300"

#
# Define volume specific information below.
#
HOST_VG=/dev/your_host_volgroup
MOUNT_DIR=/path/to/where/your/guests/lvs/get/mounted

The next configuration file,  /usr/local/etc/xen_guests_backup/guests.conf, is simply a list of each guest you would like to backup from the current host.  Each line contains the name of exactly one guest, which assumes the name is the same as the LV name on the host that provides the underlying storage for the named guest.  So, if you have a line in guests.conf that reads "xen_guest1," then one of the LVs providing guest storage on your host should be named xen_guest1.  For example:

xen_guest1
xen_guest2

Additionally, for each guest named in guests.conf, guest_lvm_mounter.sh looks for a respectively named configuration file.  So, given a guests.conf file like the one indicated above, you would have two additional configuration files for each guest, one named /usr/local/etc/xen_guests_backup/xen_guest1.source and one named /usr/local/etc/xen_guests_backup/xen_guest2.source.  I don't know why I chose ".source" as the file extension except both files contain variables that are sourced by guest_lvm_mounter.sh as follows.

#!/bin/bash
#
# File to source variables per guest.  You should have one of these per each
# guest you wish to backup.  The main point is to configure appropriate values
# for the guests' logical volumes.

G_HOSTNAME=xen_guest1
GUEST_VG=/dev/your_guest_vg_${G_HOSTNAME}
GUEST_LVOLUME=${GUEST_VG}/your_guest_lv
IMAGE=${HOST_VG}/${G_HOSTNAME}
SNAP_NAME=snap-${G_HOSTNAME}
SNAP_SIZE=50G

The last step is to actually run the script.  In my environment, my backup software invokes the script as

/usr/bin/sudo /usr/local/sbin/guest_lvm_mounter.sh mount

I have the software run the script via sudo (i.e. as root) to allow it to execute everything it needs.  As such, I have the user the backup service runs as in the sudoers file with specific permissions to run this script.  (I haven't looked into the security around the utilities the script calls, so I cannot speak much to any vulnerabilities this script might open up.)  Once the backup job completes, it invokes the script again, this time as

/usr/bin/sudo /usr/local/sbin/guest_lvm_mounter.sh fork umount

The "fork" switch tells the script to invoke another detached instance of itself in the background, then exit.  The fork function switches on the SVC_SHUTDOWN variable, which invokes a time delayed shutdown of the service prior to performing the umount, vgchange, and kpartx commands and prior to restarting the service.  I chose to do this because I was unable to deactivate the VGs or remove them from the kernel mapping table while the backup service was running.  You see, while the service is running, it maintains open file hooks into the activated VGs, which prevents their removal.  The only option I could make work was to kill the backup service, but, I thought having the backup service kill itself just before the end of a backup job could cause me problems.  So, I implemented the fork function to shutdown the service after a configurable delay, which, in my case, is 300 seconds (five minutes).  Doing so allows the backup software to get a successful return code and end the job before shutting down to release its open file hooks into the VGs.  I should mention, I implemented an additional 300-second delay after the service shuts down before trying to run the actual umount routine, as I was not sure how long it would take the kernel to expire the open file handles after the service stopped.  I am sure that information is documented somewhere, but I figured five minutes was (and has been) adequate, so I never looked it up.

So, there you go.  Feel free to use it as you like just as I used parts of it from examples others out there freely provided.  I suppose it should be considered licensed under a Creative Commons license, just like the blog.

Also, keep in mind the following.

  1. There are better ways to do this, such as image-based backups.  You could still use LVM, but you would tar-gzip and backup the entire image instead of the files inside the image.  I chose not to do so because the guests I am backing up are very large and contain databases that I previously discovered are somewhat susceptible to corruption. (More about that in another post.)
  2. You could actually perform this same process on a Windows guest sitting inside an LV, but it would be akin to shooting one of your middle toes off.  You know, definitely not the whole foot, not even a big toe, but more significant than the little guy at the end of the row.  I'm guessing about 5% of the time, you would end up with a system that is unusable, either because an application on the system has thrown up its hands in frustration at your stupidity, or because the system has met its maker, you know, Bill.

I also have some questions for anyone who actually comes across this.

  1. Do you have a good example of how to implement posixly correct command line switches for shell scripts?
  2. Do you have any suggestions for improvement?
  3. Are you going to use this, and, if so, how?  Also, are you going to change it, and, if so, how?

Thank you for taking the time to read this, and thank you for any feedback.

Sincerely,

N++

TrackBack

TrackBack URL for this entry:
http://www.thouartpop.com/blog-mt/mt-tb.fcgi/5

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)