Search |
||
Instantly turning your Hudson cluster into a Hadoop clusterPosted by kohsuke on March 15, 2009 at 8:36 PM PDT
Here at my work, I take care of a 30-40 node Hudson cluster for our group. This is probably a relatively bigger Hudson cluster, but I know people out there do set up Hudson clusters of various sizes. Hudson cluster is used for doing builds, obviously, but I've been thinking it would be nice if this cluster becomes multi-purpose, because there are a lot of things we could do better if we have a lot of computing resources in a more accessible fashion, and setting up multiple different clusters each for a different framework is tedious. So over the past 2 weekends, I've worked on a hobby project, which lets you turn your Hudson cluster into a Hadoop cluster. The idea is simple — Hudson knows the shape of its cluster, so why don't we let it start Hadoop JVM on all the nodes, and hook them all together? Hudson could also install Hadoop binaries on all the nodes as necessary, really making this solution a turn-key. In this way, you can simplify your Hadoop installation drastically; all you need is, (1) go to Hudson plugin update center, (2) install a Hadoop plugin, and (3) restart Hudson. When Hudson comes back, you have a Hadoop cluster.
My initial motivation was to use Hadoop for analyzing access logs of java.net, but eventually, I think I could use this for Hudson itself, too. Imagine just persisting lots of lots of test results, and doing some data analysis on it in a mass scale. How about using Hadoop for storing old artifacts, so that you can utilize the combined storage of a cluster? Or how about extension of JUnit for distributing tests across a Hadoop cluster? »
Related Topics >>
Java Tools Comments
Comments are listed in date ascending order (oldest first)
Submitted by emilian on Mon, 2009-03-16 01:48.
So does this mean that ~/.hudson becomes basically part of a Hadoop filesystem or you just save some particular files (like artifacts and test result files) in HDFS ?
Submitted by kohsuke on Mon, 2009-03-16 11:06.
No, right now, I'm not doing either. For now, this is just that you can start a Hadoop cluster on a Hudson cluster. Once they are started, they are independent, more or less.
In the future, storing some of the Hudson data in a Hadoop cluster, especially the big BLOBs like the artifacts, could be interesting.
Submitted by nidaley on Mon, 2009-03-16 16:23.
Nice! You must make a lot of configuration assumptions.
Regarding using Hadoop to distribute Junit tests, this has been an open Jira issue for some time:
https://issues.apache.org/jira/browse/HADOOP-1257
Submitted by kohsuke on Mon, 2009-03-16 16:56.
I need to allow people to set arbitrary Hadoop configurations to be set on Hadoop JVMs that are launched from Hudson, but if I have that, I don't think I'm making that much configuration assumptions (or do I?)
In any case, in my mind this is primarily for those who are interested in Hadoop but never had a chance to play with it, by making it easier for them to get started.
And thanks for the pointer to HADOOP-1257.
|
||
|
|