Trading performance for asynchronicity

An unusual conversation came up at one of the discussion groups in the day job recently. One of the interesting things that the Javascript language specification provides is a very good description of asynchronous execution that is then embodied in execution environments like NodeJS. Asynchronicity on the JVM is  emulated by an event loop mechanism on top of the usual threaded execution environment. In general if you run JVM code in a single-thread environment bad things will happen I would prefer to do it on at least two cores.

So I made the argument that if you want asynchronous code you would be better off executing code on NodeJS rather than emulating via something like Akka.

Some of my colleagues shot back that execution on NodeJS would be inferior and I didn’t disagree. Just like Erlang sometimes you want to trade raw execution performance to get something more useful out of the execution environment.

However people felt that you would never really want to trade performance for a pure asynchronous environment, which I found very odd. Most of the apps we write in the Guardian are not that performant because they don’t really need to be. The majority of our volume is actually handled by caching and a lot of the internal workloads are handled by frameworks like Elasticsearch that we haven’t written.

In follow up discussion I realised that people hadn’t understood the fundamental advantage of asynchronous execution which is that it is easier to reason about than concurrent code. Asynchronous execution contexts on NodeJS provide a guarantee that only one scope is executing at a time so whenever you come to look at an individual function you know that scope is limited entirely the block you are looking at.

Not many programmers are good at parsing and understanding concurrent code. Having used things like Clojure I have come to the conclusion that I don’t want to do concurrency without excellent language support. In this context switching to asynchronous code can be massively helpful.

Another common situation is where you want to try and achieve data locality. With concurrent code it is really easy to actually end up with net poorly performing code due to contention on contexts. Performing a logical and cohesive unit of work is arguably a lot easier in asynchronous code blocks. It should be easier to establish a context, complete a set of operations and then throw away the whole context, knowing that you won’t need to reload that context again as the task will now be complete.

It is hard to make definite statements of what appropriate solutions are for in particular situations. I do know though that performance is a poor place to start in terms of solution design. Understanding the pros and cons of execution modes matters considerably more.


Clojure Exchange 2013: Tommy Hall on Concurrency versus Parallelism

One of the really interesting talks at Clojure Exchange 2013 was one by Tommy Hall with the (not good) title You came for the concurrency, right?.

The talk had two main threads. It served as a helpful review of Clojure’s state-handling, concurrency and parallel processing features. Useful for beginners but also a helpful recap for the more experienced.

The other was a discussion of what we mean by concurrency and parallelism. Something that I hadn’t really thought about before (although I was aware there was a difference). Tommy referenced this talk Concurrency is not Parallelism (it’s better) by Rob Pike.

In the talk Rob gives the following definitions:

Concurrency: programming as the composition of independently executing processes

Parallelism: programming as the simultaneous execution of (possibly related) computations

In the talk Tommy gives the future function as an example of programming to indicate concurrency boundaries and the reducers library as an example of parallelism.

Rob’s presentation is worth reading in full but his conclusion is that concurrency is not a guarantee of parallel execution but that achieving parallelism without concurrency is very hard or impossible. Tommy’s talk uses an Erlang example to make the same point.

Don’t fear the monoid

While discussing reducers Tommy finally explained something that I struggled before. A lot of type champions point at reducers and shout Monoids! As if that was some kind of argument.

During his talk Tommy explains that parallel combination function needs to be able to return an identity so the function always returns a value and that because ordering of values is not guaranteed (unlike the implied order in a regular reduce)  it needs to be associative.

That makes sense. Turns out that those are also the properties of a monoid.

Fans of type systems throw terms like Monad, Dual and Monoid not to help add understanding to a discussion but to use them as shibboleths. It was far more enlightening to see an example of where the needs of a problem were driving to a category of function with certain properties. If that is common enough to deserve a shorthand-name, fair enough, but the name itself is not magical and knowing the various function category names is a feat of learning rather akin to memorising all those software pattern names from the Gang of Four’s book.


Concurrency means performance, yes?

One thing I heard a lot at the Mostly Functional conference last week that concurrency is required for performance on multicore processors. Since Moore’s Law ended it is certainly true that the old trick of not writing performant code but letting hardware advances pick up the slack has been flagging (although things like SSD have still had their impact).

However equating concurrent code with performance is subtly wrong. If there was a direct relationship then we would have seen concurrent programming adopted swiftly by the games programmers. And yet there we still see an emphasis on ordered, predictable execution, cache structure and algorithmic efficiency.

Performance is one of those vague computing terms, like scale, that has many dimensions. Concurrency has no direct relation to performance as anyone who has managed to write a concurrent program with global resource contention can attest.

There are two relevant axes to considering performance and concurrency: throughput and capacity. Concurrency, through parallelism, allows you to greatly increase your utilisation of the available resources to provide a greater capacity for work.

However that work is not inherently performed faster and may actually result in lowered throughput due to the need to read data that is not in memory and the inability to predict the order of execution.

For things like webservices that are inherently stateless then often concurrency does massively increase performance because the capacity to serve request goes up and there is no need coordinate work. If the webservice is accessing a shared store where essentially all of the key data is in memory and what we need to do is read rather than mutate that data then concurrency becomes even more desirable.

On the other hand, if what we want to do is process work as quickly as possible, i.e. maximise throughput, then concurrency can be a very poor choice.

If we cannot predict the order that work will need to be executed in, due to things like having to distribute work across threads and retry work due to temporary errors then we may have to create the entire context for the work repeatedly and load it into local memory.

In circumstances like these concurrency hurts performance. After all the fastest processing is probably still pointer manipulation of a memory-mapped file, if your want to really fast.

So concurrency means performance and beating Moore’s Law if you can be stateless and value volume of processing over unit throughput.