Skip to main content

Understanding Coercion

Posted by manning_pubs on January 30, 2013 at 7:49 AM PST



Understanding Coercion

by Michael Fogus and Chris Housen, authors of The Joy of Clojure

In many cases, Clojure's compiler will be able to highly optimize idiomatic Clojure source code. But there are times when the form of your functions, especially in interoperability scenarios, will prove to be ambiguous or even outright counter to compiler optimizations. In this article, based on chapter 12 of The Joy of Clojure, the authors lead you through one of the optimization techniques—coercion.

Although Clojure is a dynamically typed language, it does provide mechanisms for specifying value types. They are type hints and coercion. The purpose of coercion is to get at the primitive data type for a value, which we'll show next.

First rule of coercion: don't

Clojure's compiler is sophisticated enough that in many ways it'll be unnecessary to coerce values into primitives. It's often better to start with a function or code block devoid of coercions. Unless your specific application requires the utmost speed in execution, it's better to stick with the version that favors simplicity over the alternative.

But should you decide that coercion might be the choice for you, then this article will provide guidance.

Corollary: you're probably not doing it right

If you've determined that coercion can help, then it's worth stressing that you have to be careful when going down that road. In many cases with coercion, the act of adding it can actually slow your functions. The reason lies in the nature of Clojure. Functional composition leads to passing arguments back and forth between pieces, and in the circumstance of coercion you're just boxing and unboxing[1] from one call to the next.

This particular circumstance is especially devious within the body of a loop, and follows the same performance degradations observed with Java. Clojure's unboxing is an explicit[2] operation performed using the coercion functions, so there's a speck of light there. Unfortunately, autoboxing is still a danger and should be avoided if speed is a concern, as we'll explore now:

(defn occur-count [words]
  (let [res (atom {})]
    (doseq [w words] (swap! res assoc w (+ 1 (@res w 0))))
    @res))

(defn roll [n d]
  (reduce + (take n (repeatedly #(inc (rand-int d))))))

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))
; "Elapsed time: 4055.505 msecs"

The function occur-count will return a map of the occurrence counts[3] found in a given sequence. This fairly straightforward implementation uses the function roll to populate a sequence with a million simulated rolls of three six-sided dice. But four seconds seems like a long time to wait, so perhaps we can speed things up by using coercions.

An initial attempt to gain speed may be to pull out the stored count from the table and coerce it into an int:

(defn occur-count [words]
  (let [res (atom {})]
    (doseq [w words]
      (let [v (int (@res w 0))]
        (swap! res assoc w (+ 1 v))))
    @res))

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))
; "Elapsed time: 4385.8 msecs"

Well, that didn't work. The reason for a decrease in speed is that although we're specifying the type at the outer loop, we haven't reduced the need to box and unbox that value further downstream in the roll function. We might then be led to try and optimize the roll function too:

(defn roll [n d]
  (let [p (int d)]
    (reduce + (take n (repeatedly #(inc (rand-int p)))))))

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))
; "Elapsed time: 4456.393 msecs"
;=> nil

Again we've made matters worse and have spread the problems over the surface of the entire code. Being adventurous, we grasp for straws and attempt to force integer arithmetic with roll by using the unchecked-inc function:

(defn roll [n d]
  (let [p (int d)]
    (reduce + (take n (repeatedly #(unchecked-inc (rand-int p)))))))

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))

Go ahead and run that in the REPL, then go get some coffee and a bagel. Toast the bagel. Eat the bagel. Drink the coffee. By that time, you might've received a result. So what happened? In an attempt to be clever, we've confused the Clojure compiler into near unconsciousness. Instead of making direct calls to Clojure's math functions, we're now making calls indirectly via Java reflection! You can observe this by setting *warn-on-reflection* to true and reentering roll.

How can we speed things up? The problem isn't with coercion itself, but instead with the implementations of roll and occur-count. You can observe significant speed-ups by rethinking your original implementations first and then resorting to coercion second. The use of coercion should always be preceded by a reevaluation of your implementation, because often by doing so you can eliminate the need for coercion altogether, as shown next.

Listing 1 Using coercion to gain speed

(defn roll [n d]
  (loop [n (int n), sum 0]     

    (if (zero? n)
      sum
      (recur (dec n) (+ sum (inc (rand-int d)))))))

(defn occur-count [words]
  (reduce #(assoc %1 %2 (inc (%1 %2 0))) {} words))

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))
; "Elapsed time: 937.346 msecs"
;=> nil

By refactoring the original functions, we've gained a five-fold increase in speed and yet used only a single coercion. Additionally, we've managed to make the new implementation faster while also maintaining clarity. This should be a general goal when writing your Clojure code, and when forced to make a trade between the two, it might be a good idea to favor clarity.

Second rule of coercion: don't

In the previous example, there's too much noise in collection and sequence operations for primitive coercion to help much. This goes to show that it's important to remember that the Clojure compiler will often do a better job at optimization than you.

Third rule of coercion: coerce a stable local

When coercing a local to a primitive type, it's tempting to do so at the point of use, but this practice should be avoided. A good rule of thumb for coercion is to coerce only within a local context via let, binding, or loop. This provides a stable value point for the primitive, allowing you to reuse that same local elsewhere in the same function without having to again coerce at different points of use. This can be illustrated by the following:

(defn mean
  "Takes a sequence of integers and returns their mean value"
  [sq]
  (let [length (int (count sq))]
    (if (zero? length)
      0
      (/ (int (reduce + sq)) length))))

The length value has been bound in the let, allowing it to be reused twice within the body of the function. This allows for a cleaner implementation than the alternative, which coerces the results of (count sq) in multiple places. Using this advice and the fact that Clojure provides lexical scope by default, you can also avoid the need to define a name-mangled local by instead using let to rebind original argument names to coerced values (defn [x] (let [x (int x)] …)).

Fourth rule of coercion: watch your sizes

Primitive type coercions in Clojure act the same as type truncation in Java. If a given value is coerced into a type that can't hold a value of its magnitude, then data loss will occur, and in the case of integer overflow, exceptions will be thrown.

Fifth rule of coercion: truncate only as a goal

By default, Clojure doesn't limit the accuracy of mathematical operations, but this can occur when using coercion. There will be many instances in your own projects when speed is more important than accuracy in mathematical operations. Likewise, there will also be times when truncation is necessary, especially when dealing with Java library methods that take primitive types:

(Math/round 1.23897398798929929872987890030893796768727987138M)
; java.lang.IllegalArgumentException:
; No matching method found: round

When a method or function isn't overloaded, the Clojure compiler can determine whether an argument can be coerced to a primitive type and will do so if able. The preceding issue exception arises from the fact that Math/round is overloaded, taking either a float or double typed argument. Therefore, you have to explicitly use coercion to truncate the argument:

(Math/round (float 1.23897398798929929872987890030893796768727987138M))
;=> 1

Our goal in using the truncating operation float was to get a result that we knew wouldn't be affected by truncation. But many instances will arise when truncation will affect your results and will often do so to your detriment. Therefore, it's best to be wary when using coercion, because it propagates inaccuracies. It's best to limit its usage when truncation is desired and document vigorously when it's absolutely needed for speed.

Coercion can be an effective tool in your Clojure applications, but take care to be sure you understand the caveats. If you take away one lesson from this section, let it be this: do not rush to coercion.

Summary

Clojure provides numerous ways to gain speed in your applications. Using some combination of type hints, transients, chunked sequences, memoization, and coercion, you should be able to achieve noticeable performance gains. Like any powerful tool, these performance techniques should be used cautiously and thoughtfully. But once you've determined that performance can be gained, their use is minimally intrusive and often natural to the unadorned implementation.

1. Autoboxing is the automatic conversion the Java compiler makes between the primitive types and their corresponding object wrapper classes.

2. Except when directly or indirectly (via inlining or a macro body) calling a method.

3. Clojure has a function frequencies that does this, so we provide occur-count for illustrative purposes only.


Here are some other Manning titles you might be interested in:

The Real-World Functional Programming

The Real-World Functional Programming
Tomas Petricek with Jon Skeet

Scala

Scala in Action
Nilanjan Raychaudhuri

Play for Scala

IronPython in Action
Peter Hilton, Erik Bakker, and Francisco Canedo


AttachmentSize
jocloj001.jpg22.24 KB
jocloj002.jpg21.28 KB
jocloj003.jpg10.25 KB
joclojcover.jpg16.72 KB
cueball1.png1.19 KB