Skip to main content

Parallel Arrays in Scala

Posted by cayhorstmann on May 4, 2011 at 1:49 AM PDT

In my programming languages course, we have come to the one subject that
students fear more than continuations—concurrency. My inbox fills with
desperate queries about programs that yield mysterious results, simply hang, or
in the worst case, cause the computer to overheat and reset. Here in Vietnam,
that's surprisingly easy with a multi-core laptop whose fan was calibrated for
an air-conditioned room. When all those cores are busy and the room temperature
is 30° C, the fan can't keep up.

I am revising the lecture about the fork-join framework. A couple of years I
told my students that the framework was going to be included in Java 7, and
that ParallelArray would make it easy to parallelize common array
algorithms. Except that's not how it happened—ParallelArray
needs closures, and Java 7 isn't getting closures, so that part got dropped.

Scala is coming to the rescue—Scala 2.9 has parallel collections, and they
are easy to use. Try it out:

val a = new Array[BigInt](1000000)
for (i <- 0 until a.size) a(i) = i
val sum = a.par.sum

The .par method turns a regular collection into one whose
operations are parallelized.

Is it any faster? Sure it is. Use this function for timing a block of
code:

def time(f: => Unit) = {
  val start = System.currentTimeMillis
  f
  System.currentTimeMillis - start
}

Then run

time { a.par.sum } // I got 48 on a dual-core machine
time { a.sum } // I got 85

Scala has closures, of course, so we can easily make queries:

val evens = a.par.count(_ % 2 == 0)

Parallelization works nicely with for comprehensions. For
example,

for (i <- (1 to 10).par) { print(i + " ") } 
  // Prints something like 3 6 7 8 1 5 9 4 10 2

Of course, that's the same as

(1 to 10).par.foreach(print(i + "
"))
.

As you can see, multiple threads execute the block. In fact, that's what I
should have done to keep my cores busy when initializing the array:

for (i <- (0 until a.size).par) a(i) = i

What's not to like? As always, Scala lets us do things that can't be right,
such as

var count = 0
for (i <- (1 to 1000000).par) { if (i % 2 == 0) count += 1 }
count // I got 373812, not 500000

Here, multiple threads try to increment the counter—a classic race
conditon. Scala programmers have to know not to do this, but to call
the count method instead. It knows how to safely combine the
results.

In Clojure, this wouldn't be an issue. That will be the next lecture.

Related Topics >>