Skip to main content

S3Map

Posted by tomwhite on August 13, 2006 at 12:30 PM PDT

In case you haven't heard of it, Amazon S3 is a web service for storing data.
The two great things about it are that it's simple (look at its nice REST API), and it's cheap (with a pay-as-you-go charging model).

This latter point explains the growing number of startups that are using it to launch new business ventures: no data silos to maintain, and pay by the gigabyte.

My favourite innovative service to use Amazon S3 uses AJAX to great effect to implement a wiki that stores its content on S3. Read all about it.
If you think about it, this is an interactive service that resides entirely on S3, so it will scale and scale - there's no need for an application server. I think Content Management Systems could take this approach too.

It struck me that you could treat S3 as a big hashtable, so I tried writing an implementation of java.util.Map (built on the Amazon S3 Library for REST in Java) that uses S3 as its backing store. Here's some code to put some arbitrary objects into it:

Map<String, Fruit> map = new S3Map<Fruit>(bucket, awsAccessKeyId, awsSecretAccessKey);
map.put("breakfast", new Fruit("banana"));
map.put("lunch", new Fruit("apple"));

for (Map.Entry<String, Fruit> entry : map.entrySet()) {
  System.out.println(entry.getKey() + ": " + entry.getValue().getName());
}

This code prints:

breakfast: banana
lunch: apple

Notice that keys are strings, but the objects in the map can be of any type, as long as they are Serializable (as Fruit is), although even this restriction can be dropped by providing a custom serialization strategy when constructing the map. (Something like XStream would be a good choice here.)

Also, the map needs the connection parameters for an S3 account, and a bucket name. One bucket corresponds to one map.
The map is persistent, so I can run the code above again, but without putting anything into the map, and the values put in previously will be retrieved and printed out exactly as before.

Map<String, Fruit> map = new S3Map<Fruit>(bucket, awsAccessKeyId, awsSecretAccessKey);

for (Map.Entry<String, Fruit> entry : map.entrySet()) {
  System.out.println(entry.getKey() + ": " + entry.getValue().getName());
}

Writing an implementation of java.util.Map is interesting. It's actually straightforward to do, since you start by subclassing java.util.AbstractMap and implementing a handful of methods.
There's a line in the javadoc which says "Each of these methods may be overridden if the map being implemented admits a more efficient implementation."

Normally we strive for an implementation that is efficient in terms of speed, however in this case I was very aware of the monetary cost of each method. After all, I'm paying $0.20 per GB to get my data in and out of S3. There's a table in the javadoc that sets out the number of S3 operations performed by each method so you can get a feel for their cost. As it turned out, I ended up overriding almost all of AbstractMap's methods.

So, what is it useful for? Well, I haven't actually had a use for it yet, but S3Map is really a non-transactional persistent datastore, so it could be used in many scenarios, from applets (if the applet is hosted on S3 too), to desktop apps (to share user data between machines?), to server-side apps. If you're intrigued, you can try it out yourself by signing up for an Amazon S3 account, and downloading S3Map (it's hosted on S3 of course).

Related Topics >>