One of the topics for the November ThoughtWorks dojo was transducers (something I’ve looked at before and singularly failed to get working). Tranducers will be coming to clojure.core in 1.7, the code is already in Clojurescript and core.async.
There were two teams looking at transducers, one looked more at the foundations of how transducers are implemented and the other at their performance. These are my notes of what they presented back at the dojo.
How do transducers work?
One of the key ideas underpinning transducers (and their forebears reducers) is that most of the sequence operations can be implemented in terms of reduce. Let’s look at map and filter.
(defn my-map-1 [f coll] (reduce (fn [acc el] (conj acc (f el)))  coll)) (defn my-filter-1 [pred coll] (reduce (fn [acc el] (if (pred el) (conj acc el) acc))  coll))
Now these functions consist of two parts: the purpose of the function (transformation or selection of values) and the part that assembles the new sequence representing the output. Here I am using conj but conj can also be replaced by an implementation that uses reduce if you want to be purist about it.
If we replace conj with a reducing function (rf) that can supplied to the rest of the function we create these abstractions.
(defn my-map-2 [f] (fn [rf] (fn [acc el] (rf acc (f el)))) (defn my-filter-2 [pred] (fn [rf] (fn [acc el] (if (pred el) (rf acc el) acc))))
And this is pretty much what is happening when we call the single-arity versions of map and filter; in tranducers. We pass a function that is the main purpose of the operation, then a reducing function and then finally we need to do the actual transducing, here I am using reduce again but transduce does the same thing.
((my-map-2 inc) conj) ; fn (reduce ((my-map-2 inc) conj)  (range 3)) ; [1 2 3] (reduce ((my-filter-2 odd?) conj)  (range 7)) ; [1 3 5 7 9]
The team’s notes have been posted online.
How do transducers perform?
The team that was working on the performance checking compared a transduced set of functions that were composed with comp to the execution of the same functions pipelined via the right-threading macro (->>).
The results were interesting, for two or three functions performance was very similar between both approaches. However the more functions that are in the chain then the better the transduced version performs until in the pathological case there is a massive difference.
That seems to fit the promises of transducer performance as the elimination of intermediate sequences would suggest that performance stays flat as you add transforms.
There was some discussion during the dojo as to whether rewriting the historical sequence functions was the right approach and whether it would have been better to either make transducers the default or allow programmers to opt into them explicitly by importing the library like you do for reducers. The team showed that performance was consistently better with transducers (if sometimes by small margins) but also that existing code does not really need to be modified unless you previously had performance issues in which case transducers allows a simpler, direct approach to transformation chaining than was previously possible.
I suggested the transducers topic as I had singly failed to get to grips with them by myself and I was glad it sparked so much investigation and discussion. I certainly got a much better understanding of the library as a result. My thanks got to the dojo participants, particularly James Henderson.