Clojure, Programming

Transducers at the November London Clojure Dojo 2014

One of the topics for the November ThoughtWorks dojo was transducers (something I’ve looked at before and singularly failed to get working). Tranducers will be coming to clojure.core in 1.7, the code is already in Clojurescript and core.async.

There were two teams looking at transducers, one looked more at the foundations of how transducers are implemented and the other at their performance. These are my notes of what they presented back at the dojo.

How do transducers work?

One of the key ideas underpinning transducers (and their forebears reducers) is that most of the sequence operations can be implemented in terms of reduce. Let’s look at map and filter.

(defn my-map-1 [f coll]
     (fn [acc el] (conj acc (f el))) [] coll))

(defn my-filter-1 [pred coll]
     (fn [acc el]
       (if (pred el)
         (conj acc el)
   [] coll))

Now these functions consist of two parts: the purpose of the function (transformation or selection of values) and the part that assembles the new sequence representing the output. Here I am using conj but conj can also be replaced by an implementation that uses reduce if you want to be purist about it.

If we replace conj with a reducing function (rf) that can supplied to the rest of the function we create these abstractions.

(defn my-map-2 [f]
  (fn [rf]
    (fn [acc el]
      (rf acc (f el))))

(defn my-filter-2 [pred]
  (fn [rf]
    (fn [acc el]
      (if (pred el)
        (rf acc el)

And this is pretty much what is happening when we call the single-arity versions of map and filter; in tranducers. We pass a function that is the main purpose of the operation, then a reducing function and then finally we need to do the actual transducing, here I am using reduce again but transduce does the same thing.

((my-map-2 inc) conj) ; fn
(reduce ((my-map-2 inc) conj) [] (range 3)) ; [1 2 3]

(reduce ((my-filter-2 odd?) conj) [] (range 7)) ; [1 3 5 7 9]

The team’s notes have been posted online.

How do transducers perform?

The team that was working on the performance checking compared a transduced set of functions that were composed with comp to the execution of the same functions pipelined via the right-threading macro (->>).

The results were interesting, for two or three functions performance was very similar between both approaches. However the more functions that are in the chain then the better the transduced version performs until in the pathological case there is a massive difference.

That seems to fit the promises of transducer performance as the elimination of intermediate sequences would suggest that performance stays flat as you add transforms.

There was some discussion during the dojo as to whether rewriting the historical sequence functions was the right approach and whether it would have been better to either make transducers the default or allow programmers to opt into them explicitly by importing the library like you do for reducers. The team showed that performance was consistently better with transducers (if sometimes by small margins) but also that existing code does not really need to be modified unless you previously had performance issues in which case transducers allows a simpler, direct approach to transformation chaining than was previously possible.

Closing thoughts

I suggested the transducers topic as I had singly failed to get to grips with them by myself and I was glad it sparked so much investigation and discussion. I certainly got a much better understanding of the library as a result. My thanks got to the dojo participants, particularly James Henderson.

Programming, ThoughtWorks

ThoughtWorks Code Assignments

When you enter the ThoughtWorks recruitment process you are asked to code a solution to one of three problems.

Googling for the answer is clever in the sense they can be hard. It is also really stupid in that the code will be reviewed by at least two people who if they decide that your application will be taken forward will interview about your code and why you have chosen to implement your solution in the way you have.

You won’t be able to tell them that the code is the way it is because you copied it off someone else! So, write it yourself! The feedback you get will be more all the more useful if you code things in your own way. Otherwise the feedback will be relevant only to the person who wrote the code.

One piece of help I can give is that there are no trick questions or mistakes in the assignment paper. If you think you have found an error in the problem statement you have probably misunderstood the problem.

Software, Work

What’s the difference between Simple and Stupid in interview code?

Kent Beck (I believe) said: “Do the simplest thing that works, not the stupidest”. What does that mean in the context of showcase code (i.e. code you have written as part of some kind of application process)?

Well firstly think of what the point of showcase code is. What is the reviewer looking for as part of the application process? They are looking for evidence of a train of thought or method of problem solving that chimes with the kind of thought processes that they think are effective. Ideally these would be the “best” processes but you have to accept that a lot of processes are about finding people who agree with what an organisation thinks is best practice rather than respecting originality and clever solutions. You should never try to tailor your showcase code to an organisation because it is too hard to double guess what someone is looking for. You can only express yourself as clearly as you can and hope that it fits well with what is being sought.

A good piece of showcase code will appropriately abstract the problem being set, illustrating the way the candidate finds solutions to problems. It will neither be too elaborate, abstracting too much of the problem, nor too literal, coding “between the lines” of the problem being set.

Let’s take an example. If the problem is to model a set of automobile configurations in an object then the problem set may explain that a car is sold with seat configurations of 2, 4 or 6. In my view a stupid solution would be to have a list of just 2, 4, 6. Clearly the reviewer is going to want some flexibility here, the solution should be able to deal with a new configuration of 5.

An obvious solution is to just model the number of seats as an integer value. You can encapsulate the data by providing an API that allows you query how many seats a configuration has and provide a way of setting that value in an immutable way (once a configuration is set then the number of seats it has is unlikely to change in the lifetime of the object, that’s simplicity not stupidity). This solution just exposes that a configuration has a number of seats which is an integer, which is logical. It is an entirely sensible solution that keeps it simple.

How might a nervous candidate overcomplicate this situation? Well the number of seats really is immutable unless the problem set says otherwise, you don’t need to supply a mutator. Similarly the number of seats is never going to be a Real number.

What about modelling the number of seats as a list of Seat objects? Well yes that works because the encapsulation should hide the implementation and the number of seats is now just the number of Seats in the Seat list. Domain-driven design fans might even say this is a better solution that an integer because it is a closer description of the domain. However I think that simplicity would tend to demand that until we have something else to model about a Seat (say, whether it has a seatbelt) then it is hard to justify having a class whose only purpose is to be counted in a list. For all the difference it makes I could put Turtle objects into the Seat list. In terms of showcase code using this technique actually gives the reviewer a bit of a headache because instead of just ticking the box and moving on the reviewer now has to ask whether the new level of abstraction introduced is valid or not, is it a good idea to have a class with no functionality or data? What if actually the seat configurations are better represented by enumerated constants? The effort of creating the Seat class is just wasted and has to be replaced.

This solution at least stays within the bounds of an existing paradigm. If a candidate starts to abstract wildly then reviewer is going to give up in frustration. If the candidate starts abstracting the model to the point where it could model a motorcycle or truck just as well as car then they have just failed the process. Unless the problem says something about having a motorcycle then you don’t want to see Vehicle With An Engine type classes. After all if you start to generalise then where do you stop? Sure you can model a motorcycle and a truck but why not a plane? Or a rocket?