Skip to main content

Hadoop + EC2 + S3

Posted by tomwhite on July 20, 2007 at 1:10 AM PDT

I've raved about the MapReduce parallel programming model in the past, and Apache Hadoop (the framework for running MapReduce applications), and Amazon's compute and storage webservices (EC2 and S3). Now I've written an article - Running Hadoop MapReduce on Amazon EC2 and Amazon S3 - about using them all together to do some data crunching.

The nice thing is that you can fire up a fair sized Hadoop cluster (20 nodes is the current limit on EC2) in minutes and run it just for as long as you need to run your job - you pay by the hour. EC2 is still in limited beta and has had long waiting lists to get on it, but recently they cleared the backlog, so if you're interested in trying it out, now might be a good time.

Related Topics >>