Cross-Provider Cloud File Backups (Rackspace CloudFiles to Amazon S3)

Having cross-Cloud backups running on a nightly basis is recommended for any business, regardless of size. Although doing backups is fairly trivial, there are a lot of gotchas that come into play when dealing in the Cloud, especially if you want to move files from one Cloud provider to another. In this post, we'd like to share processes and code that we use when backing clients up from Rackspace CloudFiles to Amazon S3.

Using MD5 Hashes (eTags) and file descriptors, we've optimized the operations to perform only rsync-style incremental backups of only the objects that have changed. Although Initial backups will be slow, the incrementals should be very quick as the scripts will only backup file adds & file deltas.

Note: Although we're using the Rackspace CloudFiles API to access CloudFiles, another route is to use CloudFuse (which we plan on writing about in a future blog post). CloudFuse allows you to mount a remote CloudFiles container as though it were local. It's a whole lot faster than using the Rackspace API, but it does have a severe 10,000 file limitation (CloudFiles Limitation) that we find too low for most customers.

Download Sources

For the lazy (and trusting), we've put together an entire tarball of everything you'll need here:
http://quicloud.com/code-samples/rackspace-cf-to-amazon-s3-complete.tgz

That tarball contains only the minimal files you need from The Rackspace Cloud PHP API and the Amazon S3 PHP Class.

If you'd prefer to download and install everything separately, here are the links:
quicloud's Rackspace / Amazon Wrapper Code
Rackspace's Cloudfiles PHP Library
Amazon's Standalone S3 Rest Library
If you use the Rackspace & S3 sources above, please make sure that:

  1. You unzip those libraries in the proper relative paths for the wrapper ("./includes/rackspace" and "./includes/amazon").
  2. You MUST change the CloudFiles 'CF_Object' to expose "etag" as a public (not private) var.

We installed the scripts at '/etc/cloudfiles-to-s3/', and the wrapper (rackspace-cf-to-amazon-s3-bkup.php) has hardcoded links to that directory, if you decide to install elsewhere, please update those links at the top of the wrapper. The quicloud tarballs all have "cloudfiles-to-s3" as the top level directory, so unzipping them in "/etc" should work just fine.

Configuration

Now that you have all Libraries installed on your server, you simply need to configure the wrapper with your CloudFiles and S3 authentication and container/bucket locations.

Populate all of the following variables in the 'rackspace-cf-to-amazon-s3-bkup.php' wrapper file:

$rackUser = 'RACKSPACE_API_USERNAME';
$rackAPIKey = 'RACKSPACE_API_KEY';
$rackContainer = 'RACKSPACE_CONTAINER_NAME';
$amznAPIKey = 'AMAZON_KEY';
$amznAPISecret = 'AMAZON_SECRECT_KEY';
$amznBucketName = 'AMAZON_BUCKET_NAME';

Validate with Test Run

Set those variables, and let 'er rip by running the wrapper:

cd /etc/cloudfiles-to-s3
./rackspace-cf-to-amazon-s3-bkup.php -v

The "-v" (verbose) switch will output extended information, detailing file-by-file handling (whether the file was put to S3 or skipped because it already exists). After you see a few lines of output you can halt the script with a CTRL-C from the terminal.

You should see some output like:

start run at 2010-10-04 08:10:01
Loaded Rackspace Container (5 Objects)
Loaded Amazon Bucket (0 Objects)
Putting '/index.html' up to Amazon bucket 'my_backups'
Putting '/images/logo.gif' up to Amazon bucket 'my_backups'
Putting '/images/myhappypic.gif' up to Amazon bucket 'my_backups'
...

After a trial run in verbose mode, browse your Amazon S3 Bucket and verify that the files got uploaded. You can run and halt a couple of times to watch the verbose output report which files it's loading and which it's skipping (because they're already in your Amazon bucket).

Once you're satisfied that everything is operating correctly, you can run without halting to put everything up. To remove the "Skipping / Putting" messages for individual files, run the script without the "-v" (verbose) switch.

Before allowing it to fully free run, though, you may want to calculate your bandwidth costs -- Rackspace will charge you outgoing & Amazon will charge you incoming. Containers with either large files or lots of Objects will take a looooong time to copy over -- you can back-of-the-napkin estimate your initial upload time by allocating 1 hr per 10000 Objects (assumes your objects are standard Web content -- HTML files & Images). We highly recommend purchasing and using Bucket Explorer to browse and manage your Amazon S3 Files, and watch that the initial S3 population is going smooth.

Add to Crontab for nightly incremental backups

Once you're happy with your initial backup, you can add incremental nightly backups to your crontab. Add the code below to your root crontab to run your backups at 2:10AM every night, dumping the "last run" status output to the file "/etc/cloudfiles-to-s3/last-run.log":


10 2 * * * /etc/cloudfiles-to-s3/rackspace-cf-to-amazon-s3-bkup.php >
/etc/cloudfiles-to-s3/last-run.log

Validate Restoring Abilities

Initial backups are verified, nightly backups are set up... what could possibly be left? Just the most important and probably most overlooked part -- verifying that you really can restore your data from your backups. Make an agreement with your IT Lead that on a (relatively) sane day in your IT department, you'll have a drill whereupon the manager will make local backup copies of a couple of non-critical files on your live server, and then delete them from the live server, mimicking a catastrophic file loss. For bonus points, have your manager start a stopwatch when the files are removed & see how long it takes your team to get the files back from Amazon. You may even want to build some "S3 to CloudFiles" scripts which your team can use to programmatically restore from.

Hope you enjoyed this tutorial and that you find the provided code useful. We'd love to hear your feedback on your experience with cross-provider file backups, and any "backup war stories". What have you seen work? Where have you seen the dragons lurk?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Excellent!

I was tasked with the responsibility of coming up with a backup script from Amazon S3 -> Rackspace Cloud Files. Your scripts saved me a ton of time. Thank you very much.

Glad you found it helpful!

We've also just finished building out the code to do the opposite (back up from AWS to Rackspace) and will be open-sourcing that code as well. Have also found the S3-BASH scripts very useful -- http://code.google.com/p/s3-bash/

Thank you

Hi Rich,
I think it's a great idea to backup cloud files to another provider. By any chance would you happen to have any script that will do the same but from S3 to cloud file?

Robert

Working on this now

Hi Robert,

If you're still looking for a solution, we're actually doing this for a client right now. We should have a blog post up within the next week or so with details. Will post back here when it's ready.

Cheers,
-r

S3 to cloud file

Hi r I C h,

Nice script. Just want to ask if you have created a script for transferring buckets from S3 to cloudfile. Thanks.

AWS S3 backup to RackSpace Cloud files

Hi Robert,

I would like to read your blog post. Could you send me a link.

Thank you.

Backing up AWS S3 bucket to RackSpace Cloud Files?

Hi Robert,

do you have more info on this. I am keen on reading your blog post.

Thank you.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options