Skip to main content

Hudson 1.72 and new remoting infrastructure

Posted by kohsuke on January 9, 2007 at 10:14 PM PST

I posted a new version of Hudson. This is 1.72, meaning it's the 73rd release of Hudson — Of all the projects that I work on, Hudson has by far the largest number of releases, and it's still counting.

One of the key features of Hudson is its support of distributed builds (AKA "master/slave mode".) As the number of projects grow, being able to distribute the load to multiple computers become crucial. (And when you start running tests on various configurations by using Hudson, you can easily have tens of jobs on Hudson — at my work we have 150 or so jobs distributed over 15 or so slaves.)

Previously, Hudson was mostly relying on ssh/rsh for launching processes remotely, and NFS for accessing files remotely. This was fairly easy to implement, but it made it difficult to utilize slaves. For example, when Hudson wants to copy "target/**/*.jar", it basically had to scan the entire subdirectory over NFS. Or when it wants to compute changelog, it needed to do so over NFS, too.

In 1.73, instead of using ssh+NFS, Hudson now uses a "slave agent" program on the slave machine, which maintains a single bi-directional stream with the master. The master can then send a program fragment to a slave (by serializing a Callable object.) The slave executes that, and then the return value or the exception will be sent back to the master. All the necessary class files are sent on demand to the slave. Much of this is as transparent as it can get. For example, if I want to work on a remote file, I'd write a program like:

FilePath file = ...;   // FilePath object can represent either local or remote file
T s = file.act(new FileCallable<T>() {
  T invoke(File f, ... ) {
    // do something with f

and when the file is a remote file on a slave, the "invoke" method will run on the remote machine.

Put another way, this new remoting infrastructure allows me to send a program to where the data is, as opposed to send data to where the program is. And hence it can be much more efficient. It also reduces the # of configurations needed, since NFS is no longer necessary. It makes a lot easier to run Windows slaves, where setting NFS up was very difficult. Also, for me, this would let my master Hudson control even larger number of slaves.

So far I still rely on ssh/rsh as the master starts the slave agent program by using it, but since the only thing I need is the bi-directional stream, the obvious extension would be to allow people to start slaves through Java Web Start or something. Or maybe even use some P2P transport underneath. That would be fun.

Initially I was hoping to reuse some of the existing technologies, but surprisingly, I couldn't find a good one. So I wrote my own. It seems to me that this area of technology has an interesting property that it's easier to write your own than learning what someone else wrote, perhaps because everyone's need is slightly different.

Anyway, if you are interested in this, you can see javadoc here. While it has name "hudson" on it, this module is available on its own jar and reusable outside Hudson.