Skip to main content

Understanding Coercion

Posted by manning_pubs on January 30, 2013 at 7:49 AM PST

Understanding Coercion

by Michael Fogus and Chris Housen, authors of The Joy of Clojure

In many cases, Clojure's compiler will be able to highly optimize idiomatic Clojure source code. But there are times when the form of your functions, especially in interoperability scenarios, will prove to be ambiguous or even outright counter to compiler optimizations. In this article, based on chapter 12 of The Joy of Clojure, the authors lead you through one of the optimization techniques—coercion.

Although Clojure is a dynamically typed language, it does provide mechanisms for specifying value types. They are type hints and coercion. The purpose of coercion is to get at the primitive data type for a value, which we'll show next.

First rule of coercion: don't

Clojure's compiler is sophisticated enough that in many ways it'll be unnecessary to coerce values into primitives. It's often better to start with a function or code block devoid of coercions. Unless your specific application requires the utmost speed in execution, it's better to stick with the version that favors simplicity over the alternative.

But should you decide that coercion might be the choice for you, then this article will provide guidance.

Corollary: you're probably not doing it right

If you've determined that coercion can help, then it's worth stressing that you have to be careful when going down that road. In many cases with coercion, the act of adding it can actually slow your functions. The reason lies in the nature of Clojure. Functional composition leads to passing arguments back and forth between pieces, and in the circumstance of coercion you're just boxing and unboxing[1] from one call to the next.

This particular circumstance is especially devious within the body of a loop, and follows the same performance degradations observed with Java. Clojure's unboxing is an explicit[2] operation performed using the coercion functions, so there's a speck of light there. Unfortunately, autoboxing is still a danger and should be avoided if speed is a concern, as we'll explore now:

(defn occur-count [words]
  (let [res (atom {})]
    (doseq [w words] (swap! res assoc w (+ 1 (@res w 0))))

(defn roll [n d]
  (reduce + (take n (repeatedly #(inc (rand-int d))))))

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))
; "Elapsed time: 4055.505 msecs"

The function occur-count will return a map of the occurrence counts[3] found in a given sequence. This fairly straightforward implementation uses the function roll to populate a sequence with a million simulated rolls of three six-sided dice. But four seconds seems like a long time to wait, so perhaps we can speed things up by using coercions.

An initial attempt to gain speed may be to pull out the stored count from the table and coerce it into an int:

(defn occur-count [words]
  (let [res (atom {})]
    (doseq [w words]
      (let [v (int (@res w 0))]
        (swap! res assoc w (+ 1 v))))

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))
; "Elapsed time: 4385.8 msecs"

Well, that didn't work. The reason for a decrease in speed is that although we're specifying the type at the outer loop, we haven't reduced the need to box and unbox that value further downstream in the roll function. We might then be led to try and optimize the roll function too:

(defn roll [n d]
  (let [p (int d)]
    (reduce + (take n (repeatedly #(inc (rand-int p)))))))

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))
; "Elapsed time: 4456.393 msecs"
;=> nil

Again we've made matters worse and have spread the problems over the surface of the entire code. Being adventurous, we grasp for straws and attempt to force integer arithmetic with roll by using the unchecked-inc function:

(defn roll [n d]
  (let [p (int d)]
    (reduce + (take n (repeatedly #(unchecked-inc (rand-int p)))))))

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))

Go ahead and run that in the REPL, then go get some coffee and a bagel. Toast the bagel. Eat the bagel. Drink the coffee. By that time, you might've received a result. So what happened? In an attempt to be clever, we've confused the Clojure compiler into near unconsciousness. Instead of making direct calls to Clojure's math functions, we're now making calls indirectly via Java reflection! You can observe this by setting *warn-on-reflection* to true and reentering roll.

How can we speed things up? The problem isn't with coercion itself, but instead with the implementations of roll and occur-count. You can observe significant speed-ups by rethinking your original implementations first and then resorting to coercion second. The use of coercion should always be preceded by a reevaluation of your implementation, because often by doing so you can eliminate the need for coercion altogether, as shown next.

Listing 1 Using coercion to gain speed

(defn roll [n d]
  (loop [n (int n), sum 0]     

    (if (zero? n)
      (recur (dec n) (+ sum (inc (rand-int d)))))))

(defn occur-count [words]
  (reduce #(assoc %1 %2 (inc (%1 %2 0))) {} words))

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))
; "Elapsed time: 937.346 msecs"
;=> nil

By refactoring the original functions, we've gained a five-fold increase in speed and yet used only a single coercion. Additionally, we've managed to make the new implementation faster while also maintaining clarity. This should be a general goal when writing your Clojure code, and when forced to make a trade between the two, it might be a good idea to favor clarity.

Second rule of coercion: don't

In the previous example, there's too much noise in collection and sequence operations for primitive coercion to help much. This goes to show that it's important to remember that the Clojure compiler will often do a better job at optimization than you.

Third rule of coercion: coerce a stable local

When coercing a local to a primitive type, it's tempting to do so at the point of use, but this practice should be avoided. A good rule of thumb for coercion is to coerce only within a local context via let, binding, or loop. This provides a stable value point for the primitive, allowing you to reuse that same local elsewhere in the same function without having to again coerce at different points of use. This can be illustrated by the following:

(defn mean
  "Takes a sequence of integers and returns their mean value"
  (let [length (int (count sq))]
    (if (zero? length)
      (/ (int (reduce + sq)) length))))

The length value has been bound in the let, allowing it to be reused twice within the body of the function. This allows for a cleaner implementation than the alternative, which coerces the results of (count sq) in multiple places. Using this advice and the fact that Clojure provides lexical scope by default, you can also avoid the need to define a name-mangled local by instead using let to rebind original argument names to coerced values (defn [x] (let [x (int x)]

jocloj001.jpg22.24 KB
jocloj002.jpg21.28 KB
jocloj003.jpg10.25 KB
joclojcover.jpg16.72 KB
cueball1.png1.19 KB