Skip to main content

Instantly turning your Hudson cluster into a Hadoop cluster

Posted by kohsuke on March 15, 2009 at 8:36 PM PDT

Here at my work, I take care of a 30-40 node Hudson cluster for our group. This is probably a relatively bigger Hudson cluster, but I know people out there do set up Hudson clusters of various sizes.

Hudson cluster is used for doing builds, obviously, but I've been thinking it would be nice if this cluster becomes multi-purpose, because there are a lot of things we could do better if we have a lot of computing resources in a more accessible fashion, and setting up multiple different clusters each for a different framework is tedious.

So over the past 2 weekends, I've worked on a hobby project, which lets you turn your Hudson cluster into a Hadoop cluster.

The idea is simple — Hudson knows the shape of its cluster, so why don't we let it start Hadoop JVM on all the nodes, and hook them all together? Hudson could also install Hadoop binaries on all the nodes as necessary, really making this solution a turn-key.

In this way, you can simplify your Hadoop installation drastically; all you need is, (1) go to Hudson plugin update center, (2) install a Hadoop plugin, and (3) restart Hudson. When Hudson comes back, you have a Hadoop cluster.

My initial motivation was to use Hadoop for analyzing access logs of java.net, but eventually, I think I could use this for Hudson itself, too. Imagine just persisting lots of lots of test results, and doing some data analysis on it in a mass scale. How about using Hadoop for storing old artifacts, so that you can utilize the combined storage of a cluster? Or how about extension of JUnit for distributing tests across a Hadoop cluster?

Related Topics >>

Comments

I need to allow people to set arbitrary Hadoop configurations to be set on Hadoop JVMs that are launched from Hudson, but if I have that, I don't think I'm making that much configuration assumptions (or do I?) In any case, in my mind this is primarily for those who are interested in Hadoop but never had a chance to play with it, by making it easier for them to get started. And thanks for the pointer to HADOOP-1257.

Nice! You must make a lot of configuration assumptions. Regarding using Hadoop to distribute Junit tests, this has been an open Jira issue for some time: https://issues.apache.org/jira/browse/HADOOP-1257

No, right now, I'm not doing either. For now, this is just that you can start a Hadoop cluster on a Hudson cluster. Once they are started, they are independent, more or less. In the future, storing some of the Hudson data in a Hadoop cluster, especially the big BLOBs like the artifacts, could be interesting.

So does this mean that ~/.hudson becomes basically part of a Hadoop filesystem or you just save some particular files (like artifacts and test result files) in HDFS ?