Skip to main content

Preparing to Compute Using the Fork/Join Framework, Part 1

Posted by editor on May 8, 2012 at 4:18 PM PDT

A couple weeks ago I figured out (more or less) what's wrong with Java threads. It's not that there's anything so wrong with Java threads per se -- it's just that they were designed to meet a need that is different from what developers face today: a world full of multicore processors powering everything from PCs to tablets to phones.

Now I'm working on converting a Java threads app that I developed a while back to utilize Java 7's new Fork/Join Framework. As the example illustrates, you can actually implement Fork/Join concurrency in a relatively small number of lines.

The fundamental component in the ForkBlur example is the ForkBlur class. Here's the key code that does the computation:

public class ForkBlur extends RecursiveAction {
  private int[] mSource;
  private int mStart;
  private int mLength;
  private int[] mDestination;
  private int mBlurWidth = 15; // Processing window size, should be odd.
  public ForkBlur(int[] src, int start, int length, int[] dst) {
    mSource = src;
    mStart = start;
    mLength = length;
    mDestination = dst;

  // Average pixels from source, write results into destination.
  protected void computeDirectly() {
    int sidePixels = (mBlurWidth - 1) / 2;
    for (int index = mStart; index < mStart + mLength; index++) {
      // Calculate average.
      float rt = 0, gt = 0, bt = 0;
      for (int mi = -sidePixels; mi <= sidePixels; mi++) {
        int mindex = Math.min(Math.max(mi + index, 0), mSource.length - 1);
        int pixel = mSource[mindex];
        rt += (float)((pixel & 0x00ff0000) >> 16) / mBlurWidth;
        gt += (float)((pixel & 0x0000ff00) >>  8) / mBlurWidth;
        bt += (float)((pixel & 0x000000ff) >>  0) / mBlurWidth;
      // Re-assemble destination pixel.
      int dpixel = (0xff000000     ) |
                   (((int)rt) << 16) |
                   (((int)gt) <<  8) |
                   (((int)bt) <<  0);
      mDestination[index] = dpixel;

  protected static int sThreshold = 10000;
  protected void compute() {
    if (mLength < sThreshold) {
    int split = mLength / 2;
    invokeAll(new ForkBlur(mSource, mStart,         split,           mDestination),
              new ForkBlur(mSource, mStart + split, mLength - split, mDestination));

RecursiveAction is an abstract class that extends ForkJoinTask. Looking at the documentation for ForkJoinTask, you can see once again that standard Java threads are considered inadequate for the true parallel computational processing that modern multicore processors invite:

A ForkJoinTask is a thread-like entity that is much lighter weight than a normal thread. Huge numbers of tasks and subtasks may be hosted by a small number of actual threads in a ForkJoinPool...

In the code snippet, the computeDirectly() method is what performs the actual computations (in this case, a simplified method of blurring an image). You'll note that the input data is in the mSource array, the output is written to the mDestination array, and variables mStart and mLength define the locations within the array where the processing will occur. So, if mStart is set to 0 and mLength is set to the length of the array (i.e., the number of pixels in the image), you could call computeDirectly() and simply execute the blurring without parallel processing.

But, that's not what we want to do on our fancy N-core machine, right? We don't have time for that! Or, rather, our user doesn't... and we don't want them to switch to our competitor's app that takes full advantage of multicore processors, do we?

So, now let's look at the compute() method. This is indeed rather tiny:

  protected void compute() {
    if (mLength < sThreshold) {

    int split = mLength / 2;
    invokeAll(new ForkBlur(mSource, mStart,         split,           mDestination),
              new ForkBlur(mSource, mStart + split, mLength - split, mDestination));

This says: if the current amount of the array to be processed is less than a threshold, call computeDirectly(); otherwise, divide the length to be processed in half, and utilize Java 5's invokeAll to execute the listed collection of tasks.

And what are the tasks we're going to execute if mLength was still larger than sThreshold? Two new ForkBlur() instantiations, one to process the first half of the array locations, the second to process the second half of the array locations. In other words, the work task is split into two tasks, each of half the size.

At this point you may be wondering: what about this sThreshold? How does that get selected? Do I just pull a value out of my hat? Call java.util.Random()?

This actually is one of my own primary questions regarding the Fork/Join Framework. If I have an N-core computer, I can divide a task into N equally-size pieces, launch N Java threads, and fully utilize my processor cores. So, if I'm utilizing the Fork/Join framework to do the same work, do I set a threshold value that accomplishes approximately the same thing? Or is that not the most efficient approach?

And, isn't it unreasonable to assume that a threshold value that optimizes performance on a particular device will also optimize performance on other devices that have different numbers of processing cores, different amounts of memory, etc. It doesn't seem like this threshold is going to be a "one-size-fits-all" proposition, to me...

This is definitely something I'll be researching as my experimentation continues. Weblogs

Since my last blog post, several people have posted new blogs:


Our current poll asks Do you read Java Magazine?. Voting will be open until Friday, May 11.

Java News

Here are the stories we've recently featured in our Java news section: