S3: Using Amazon S3 for large file transfers

A few days ago, a friend of mine reached out asking for a good solution for securely transferring a relatively large (~1GB) file to several of her prospective clients. Strangely, even in 2013 the options for transferring such a large file in a reliable manner is pretty limited. I looked into services like YouSendIt, WeTransfer, and SendThisFile but they all suffer from similar limitations. Most of them have a <1GB file size limit, their payment plans are monthly subscription based instead of pay as you go, and they don’t offer custom domains or access control. Apart from these services, there is also the trusty old school option of using an FTP server but that raises the issue of having to maintain your own FTP server, using a non-intuitive FTP client, and still being locked into paying a monthly fee instead of “pay as you go". Stepping back and looking at the issue from a different angle, it then became clear that the S3 component of Amazon’s Web Service offering is actually an ideal solution for this problem. The S3 piece of AWS is basically a flexible “cloud based” storage solution that lets you programmatically upload files, store them indefinitely, and then serve them as you please. Looking at the issues we’re trying to overcome, S3 satisfies all of them out of the box. S3 has a single file size limit of 5 Terabytes, files can be served off a custom domain like archives.setfive.com, billing is pay as you go depending on the resources you use, and S3 supports access control so you have fine grained access over who can download files and for how long. So how do you actually use S3?

Setting up and using S3

  • The first thing you’ll need is an Amazon account that has S3 enabled. If you already have an Amazon account, just head over to http://aws.amazon.com/s3/ to activate S3 for your account.
  • Next, there are several ways to actually use S3 but the easy way is probably using Amazon’s own Web Console. Just head over to https://console.aws.amazon.com/s3/home?region=us-east-1 to load the console.
  • In AWS parlance, you’ll need to create a “bucket” which is the root organizational structure on S3. You can map a “bucket” to a custom domain name so think of it like the “drive” that you’re upload files to. Go ahead and create a bucket!
  • Next, click the name of your bucket and you’ll get “into” the bucket where you should see a notice telling you the bucket is empty. This is where you can upload and delete files or create additional organizational folders. To upload a file, click the “Actions” menu in the header and select “Upload”. Click upload, and then in the popup select “Add Files” to add some files and “Stat Upload” to kick off the upload.
  • When the upload finishes, in the left panel you’ll see the file you just upload. Congratulations you’re using the cloud! If you want to make the file PUBLIC, just right click on it and click “Make Public”, this will let you access the file without any special URL arguments like https://s3.amazonaws.com/big-bertha/logo_horizontal.png
  • To get the link for your file, click it to see the properties and then on the right panel you’ll see the link.
  • To delete a file, just right click on it and select “Delete”

Anyway, thats a quick rundown of how to use Amazon’s S3 service for file transfers. The pricing is also *very* cheap compared to traditional “large file transfer” services.

Check out some other useful links about S3:

Deleting files older than specified time with s3cmd and bash

Update: Amazon has now made it so you can set expiration times on objects in S3, see more here: https://forums.aws.amazon.com/ann.jspa?annID=1303

Recently I was working on a project where we upload backups to Amazon S3.  I wanted to keep the files around for a certain duration and remove any files that were older than a month.  We use the s3cmd utility for most of our command line based calls to S3, however it doesn’t have a built in “delete any file in this bucket that is older than 30 days” function.  After googling around a bit, we found some python based scripts, however there wasn’t any that was a simple bash script that would do what I was looking for.  I whipped this one up real quick, it may not be the best looking but it gets the job done:

Upload directly to S3 with SWFUpload

I was working on an application earlier today that required allowing a user to upload a large file (several hundred MB) which would eventually be stored on Amazon S3. After reviewing the requirements, I realized it made sense to just upload the file directly to S3 instead of having to first stage the file on a server and then use PHP to push the file to S3.

Amazon has a nice walk through of using a plain HTML form to upload a file directly to S3 here.

I had all ready been using SWFUpload to upload files to the server so I decided to look into using it to uploading directly to S3. After some head banging, I finally got it to work – here’s the quick n dirty.

  1. Download SWFUpload 2.5
  2. Get SWFUpload ready to use in your project. Copy the SWF file somewhere accessible and include their swfupload.js Javascript file. More info here
  3. Setup an S3 bucket. You’ll need to set the policy to allow uploads from your own user (its the default).
  4. Place a crossdomain.xml file in the root of your S3 bucket. This file “authorizes” flash player to upload files into this host. The content of the file is below.
  5. Initialize the SWFUpload object (example below).
  6. Before beginning the upload, you need to set the appropriate postParams in the SWFUpload object. This is really the “magic” of this process. Example is below.
  7. Start the upload with startUpload()

Thats it! It’s pretty straight forward once you have things going. As an FYI, you can put SWFUpload into “debug” mode by adding debug: true as a property to the initialization object. You can also debug the responses from Amazon by using a packet sniffer like Wireshark.

crossdomain.xml

You probably want to make this file a little less permissive. More details here. Also note, there are differences in the implementation of the file between various versions of Flash player.

Initialize SWFUpload

Set SWFUpload postParams

The HMAC signature MUST be calculated on the server because it uses your S3 secret. You MUST keep that value secret in order to maintain the security of your S3 buckets. I’m using Don Schonknecht’s S3 PHP library to calculate the HMAC signatures but you could just as easily do it in straight PHP.