The Source for Java Technology Collaboration
User: Password:



Kelly O'Hair

Kelly O'Hair's Blog

JPRT: Build/Test System for the JDK

Posted by kellyohair on September 13, 2006 at 08:48 PM | Comments (5)

I did a little blogging on JPRT at http://blogs.sun.com/kto/entry/jprt_sun_hardware_is_so but that was mostly to talk about the COOL rack of Sun hardware that I used. Now I want to talk a little more about why we need something like JPRT, and what it does for us. I've been working on this JPRT project for quite some time now, so I've kind of lost touch with the real world lately. Ronald Reagan is still the President, isn't he? ;^) Anyway....

JPRT ("JDK Putback Reliablity Testing", but ignore what the letters stand for, I change what they mean every day, just to annoy people :^) is a build and test system for the JDK, or any source base that has been configured for JPRT. As I mentioned in the above blog, JPRT is a major modification to a system called PRT that the HotSpot VM development team has been using for many years, very successfully I might add. Keeping the source base always buildable and reliable is the first step in the 12 steps of dealing with your product quality... or was the 12 steps from Alcoholics Anonymous... oh well, anyway, it's the first of many steps. ;^)

Internally when we make changes to any part of the JDK, there are certain procedures we are required to perform prior to any putback or commit of the changes. The procedures often vary from team to team, depending on many factors, such as whether native code is changed, or if the change could impact other areas of the JDK. But a common requirement is a verification that the source base with the changes (and merged with the very latest source base) will build on many of not all 8 platforms, and a full 'from scratch' build, not an incremental build, which can hide full build problems. The testing needed varies, depending on what has been changed.

Anyone that was worked on a project where multiple engineers or groups are submitting changes to a shared source base knows how disruptive a 'bad commit' can be on everyone. How many times have you heard:
"So And So made a bunch of changes and now I can't build!".
But multiply the number of platforms by 8, and make all the platforms old and antiquated OS versions with bizarre system setup requirements and you have a pretty complicated situation (see http://download.java.net/jdk6/docs/build/README-builds.html).

We don't tolerate bad commits, but our enforcement is somewhat lacking, usually it's an 'after the fact' correction. Luckily the Source Code Management system we use (another antique called TeamWare) allows for a tree of repositories and 'bad commits' are usually isolated to a small team. Punishment to date has been pretty drastic, the Queen of Hearts in 'Alice in Wonderland' said 'Off With Their Heads', well trust me, you don't want to be the engineer doing a 'bad commit' to the JDK. With JPRT, hopefully this will become a thing of the past, not that we have had many 'bad commits' to the master source base, in general the teams doing the integrations know how important their jobs are and they rarely make 'bad commits'. So for these JDK integrators, maybe what JPRT does is keep them from chewing their finger nails at night. ;^)

Over the years each of the teams have accumulated sets of machines they use for building, or they use some of the shared machines available to all of us. But the hunt for build machines is just part of the job, or has been. And although the issues with consistency of the build machines hasn't been a horrible problem, often you never know if the Solaris build machine you are using has all the right patches, or if the Linux machine has the right service pack, or if the Windows machine has it's latest updates. Hopefully the JPRT system can solve this problem. When we ship the binary JDK bits, it is SO very important that the build machines are correct, and we know how difficult it is to get them setup. Sure, if you need to debug a JDK problem that only shows up on Windows XP or Solaris 9, you'll still need to hunt down a machine, but not as a regular everyday occurance.

I'm a big fan of a regular nightly build and test system, constantly verifying that a source base builds and tests out. There are many examples of automated build/tests, some that trigger on any change to the source base, some that just run every night. Some provide a protection gateway to the 'golden' source base which only gets changes that the nightly process has verified are good. The JPRT (and PRT) system is meant to guard the source base before anything is sent to it, guarding all source bases from the evil developer, well maybe 'evil' isn't the right word, I haven't met many 'evil' developers, more like 'error prone' developers. ;^) Humm, come to think about it, I may be one from time to time. :^{ But the point is that by spreading the build up over a set of machines, and getting the turnaround down to under an hour, it becomes realistic to completely build on all platforms and test it, on every putback. We have the technology, we can build and rebuild and rebuild, and it will be better than it was before, ha ha... Anybody remember the Six Million Dollar Man? Man, I gotta get out more often.. Anyway, now the nightly build and test can become a 'fetch the latest JPRT build bits' and start extensive testing (the testing not done by JPRT, or the platforms not tested by JPRT).

Is it Open Source? No, not yet. Would you like to be? Let me know. Or is it more important that you have the ability to use such a system for JDK changes?

So enough blabbering on about this JPRT system, tell me what you think.
And let me know if you want to hear more about it or not.

Stay tuned for the next episode, same Bloody Bat time, same Bloody Bat channel. ;^)

-kto


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • I would be interested.
    It's very inneficient to expect developers to try their builds for several platforms. Automation is much better. If they can just check in the code and the system checks the checkin immediately rejecting a bad submit then we'd have a winner. It's a process that cries for automation.
    To reduce build times the builds could be distributed. Then the time to build can reduced quite a bit.

    Posted by: dog on September 14, 2006 at 07:32 AM

  • Sounds like a fun tool, and those are always nice to have. :)

    Are you using 'live' boxes for testing, or running them from a simulator? I've been thinking about using qemu/vmware/xen/* images for different operating systems for testing Kaffe on the fly through Tinderbox, but haven't had the time to hook things up yet.

    Posted by: robilad on September 14, 2006 at 08:08 AM

  • The current boxes are all 'live', dedicated to the JPRT system, and actually dedicated to the one job at the head of the queue. We know we could do better about sharing the resources between jobs, but the overall reliability and simplicity of the system is better when a single job has all the resources to itself. Not that this is a 'simple' thing. We have discovered that we can load balance the system by adding more machines to the platforms that build or test slower, and avoiding some tests on some platforms that just don't perform (e.g. the older Linux boxes have trouble with highly MT tests). We will be looking at xen/vmware, but everything is just a matter of time. Thanks for the comments.

    Posted by: kellyohair on September 14, 2006 at 08:31 AM

  • Kelly, can you contrast this with the DTF system we use in SQE? Or the Grid Engine? My question is focusing just on the job distribution mechanism, not the rest of what JPRT does such as doing checkouts and managing buildability and managing whether a given putback passes the requirements. In the SQE team we've found DTF to be very useful, especially in that it automagically distributes a wide variety of jobs over the whole set of systems.

    Hmm... it seems now that I'm thinking about it ... you probably have only one kind of job to distribute to your systems. In SQE we have a wide variety of jobs, because each test execution run is different.

    Posted by: robogeek on September 14, 2006 at 11:19 AM

  • Hi there,

    Good to know about JPRT.

    Meanwhile.........
    I'm wondering if TeamWare is being open sourced any time ? I know that a lot of clearcase licensees use Solaris and probably continue to do so in the future.

    But I'm thinking that since Sun is open sourcing a lot of its software(and its hardware like opensparc), it can engage more customers if it releases TeamWare since it has features that CVS/Subversion does NOT have which are similar to clearcase and bitkeeper which are both fairly expensive.

    If you open source Teamware, then the community could make it as fast as Perforce, which will be really cool.

    BR,
    ~A

    Posted by: anjanb2 on September 14, 2006 at 11:55 AM





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds