Skip to main content

Distributed Groovy computation across a Hudson cluster

Posted by kohsuke on July 15, 2009 at 9:02 PM PDT

Hudson Distributed Fork Plugin

Most of the tests written today are designed to be executed on a single system (in fact, many of them don't even try to use multiple threads.) This tends to increase the time it takes to run tests, which in turn hurts our productivity. The way I see it, the reason such a single system environment is assumed (despite the fact that we are almost always connected to a network) is partly because we don't really have a good way to let a build/test to use multiple systems easily.

To solve that, I released the Hudson distributed fork plugin today as a first step. This plugin turns a Hudson cluster into a general purpose program execution environment, where people can submit files and programs, execute it on Hudson, and gets the results back.

One of the interface to this functionality is a CLI modeled after SSH. So for example, the following command picks up a node in a Hudson cluster and starts Tomcat, with a port forwarding so that you can access this Tomcat without knowing where it's running. All the data needed for an execution (in this case just apache-tomcat-*.tar.gz) is sent to this slave, and everything gets cleaned up after you terminate this process.

% java -jar hudson-cli.jar -s http://hudson.sfbay.sun.com/ dist-fork -z ~/apache-tomcat-5.5.27.tar.gz \\
    -L 9999:localhost:8080 apache-tomcat-5.5.27/bin/catalina.sh run
[distfork6313354709312415597.tmp] $ apache-tomcat-5.5.27/bin/catalina.sh run
Jul 15, 2009 12:25:31 PM org.apache.catalina.core.AprLifecycleListener lifecycleEvent
INFO: The Apache Tomcat Native library which allows optimal performance in production environments was not found on ...
Jul 15, 2009 12:25:31 PM org.apache.coyote.http11.Http11BaseProtocol init
...

I was talking to Vivek Pandey the other day, who told me that his tests of GlassFish Scripting takes a long time to run because each test needs to deploy an application and undeploy it. An obvious way to speed that up is to run those tests in parallel by running multiple GF server instances, but doing so in a single system doesn't give you a substantial gain because you can only run so many GF instances on one PC.

But with something like this, you can suddenly start a whole lot more GlassFish servers, somewhere on the network. Running two dozen GlassFish at once is now a piece of cake. And this functionality is available for every developer on our corporate network, not just for jobs on Hudson.

There are many other situations where a remote batch job execution is convenient. For example, when we assemble a virtual machine image, we need to run it on an environment that has all the necessary virtualization software. Setting up such an environment for everyone is a waste of time, so it's better to prepare a few slaves on Hudson, and let people submit builds there. Unlike a regaular Hudson job/build model, people can submit whatever they have on their workspace, even a work-in-progress source tree.

Droovy

CLI is useful for build/test automation, but in other cases, it's more convenient to have a programmable API for this. It should let you lease a new system, run computation over there, bring data back and forth, and so on. This frees you from a "batch processing" model and allows you to do more coordinated/interactive computation that spans multiple systems.

So I took the remoting mechanism that Hudson uses by itself to control its slaves and laid it on top of the distributed fork plugin. This allows your Java program to provision a new JVM on a remote server and execute code there seamlessly:

Droovy droovy = new Droovy(); // this talk to Hudson identified by the HUDSON_URL env var
Server server1 = droovy.connect();
// ... provision more servers if you want ...

System.out.println(add(server1,1,1)); // compute 1+1 on a remote system


int add(Server server, final int x, final int y ) {
  return server1.getChannel().call(new Callable<Integer,RuntimeException>() {
    public Integer call() {
        return x+y;
    }
  });
}

Behind the scene, all the necessary class files and so on are shipped to the remote JVM silently, so this just works, even though your Hudson cluster doesn't know anything about your program. Also, I can easily make this fall back to a single system environment, if there's no Hudson available (although I haven't implemented that yet.)

What the above code does is essentially to send a closure to a remote JVM and execute it there. Unfortunately, the syntax of closure in Java is awkward, so I added the Groovy binding in Droovy to make this really easy to use (and hence the name.) This allows you to script a computation across multiple systems in a single Groovy script. droovy.jar has a main method that hooks this up with a CLI, which lets you do something like:

$ java -jar droovy-1.0-jar-with-dependencies.jar http://hudson.sfbay/ << EOF
def pid() {
  return new File("/proc/self").canonicalPath
}

println "connecting"
server1 = connect();
server2 = connect();

i=0
(0..<4).each {
  i++
  server1 {
    println "hello from ${pid()} (${i})"
  }
  server2 {
    println "hello from ${pid()} (${i})"
  }
  println "hello from ${pid()} (${i})"
}
EOF
connecting
channel started
channel started
hello from /proc/10244 (1)
hello from /proc/10267 (1)
hello from /proc/10222 (1)
hello from /proc/10244 (2)
hello from /proc/10267 (2)
hello from /proc/10222 (2)
hello from /proc/10244 (3)
hello from /proc/10267 (3)
hello from /proc/10222 (3)
hello from /proc/10244 (4)
hello from /proc/10267 (4)
hello from /proc/10222 (4)
channel stopped
channel stopped

... and you now have a program that spans across 3 JVMs. Notice that closures capture their environments, so you see the correct value of "i" printed everywhere (and stdout is all forwarded back to your shell.)

Conslusion / Next Step

What I'm really after is to make it very easy to write tests that utilize 10s of computers in a Hudson cluster. I hope I showed enough pieces to get some of you interested.

My next step is to expand Parallel JUnit module so that it not only run tests in parallel but optionally do so across multiple systems. I'm also looking at Maven Surefire plugin that your Maven projects can take advantages of this by tweaking your POM a bit.

Related Topics >>

Comments

Awesome...

Frickin cool

@dsmiley: from what I know with Hadoop, I don't think you can maintain interactive communication channel between map/reduce tasks running on Hadoop and outside. Hadoop is really designed toward batch processing. That said, I'd imagine it's not impossible to make it do something like this, so if you know some work done with Hadoop in this space, please let me know.

Cool! But I wonder if there was redundant work here that could be better implemented with hadoop.