Programming

Getting to grips with Generators in Javascript

One of the things that makes Python such an amazing language is its far-sighted inclusion of generators and first-order list comprehensions. Both of these things are now coming to Javascript with a healthy sense of homage in the syntax. All the examples in this post have been tested in the console of Firefox 23.

What are generators?

Generators are functions that can be called multiple times and on each call they resume execution from the state they were in when they last stopped executing. To do this they have a special keyword called yield instead of return.

Let’s consider the following trivial example:

function hello() { return "Hello"; }

hello(); // "Hello"

function lazyHello() { yield "Hello"; }

lazyHello(); // [object Generator]

var g = lazyHello(); g.next(); // "Hello"

This example might seem trivial but it actually encapsulates all the important values of generators and lazy evaluation. The difference between hello and lazyHello is that lazyHello defers execution of itself. Instead of executing it returns a Generator object that can then be evaluated for an answer later, rather like Deferreds.

This lazy evaluation means that you describe very complex or expensive calculations but not have to execute them until you need them. Now for a function that returns a single value that isn’t going to seem very exciting.

Writing infinite sequences with Generators

How can describe the world of all the positive integers? Well one way is to take the value of 1 as the start of arithmetic sum and simply add one to the previous value of the sum.

function postiveIntegers() {
  var i = 1;

  while(true) {
    yield i++;
  }
}

var g = postiveIntegers();
g.next(); // 1
g.next(); // 2

When we call next on the Generator we start executing the function, so we initialise the value of the variable and enter the loop. We then call yield and the function stops running. When we call next again we do not run the function again, we resume from the point where we stopped last time, inside the while loop.

Generators provide strong encapsulation of their data since once we have obtained a generator we can no longer access its state, just the results of its calculations. If you wanted to do this sum computation with a regular function then you would need to hold the state of the last call to the function somewhere outside the function and pass it again when you wanted to calculate the next value.

However that infinite loop is still a bit tricky and if you get it wrong then you will lock up the browser. Here’s a safer version.

function postiveIntegers(maxValue) {
  var i = 1;

  while(i <= maxValue) {
    yield i++;
  }

}

var g = positiveIntegers(2);

g.next(); // 1
g.next() // 2
g.next() // [object StopIteration]

Truly infinite sequences are really powerful but in practice you probably want to use weak-sauce versions that have some kind of bound.

Why are generators awesome?

Well firstly they allow the expression of things that would otherwise be impossible in computer code, things like very long or infinite sequences that would not fit in memory if every member had to be expressed or stored.

As part of this lazy evaluation generators are actually very economic in terms of resources, you can define a lot of rules and potential values but then only express the ones you are interested in, minimising the actual allocations.

Finally they are strongly encapsulated functions which hide their internal implementation without the need to resort to object semantics.

Standard
Programming

Concurrency means performance, yes?

One thing I heard a lot at the Mostly Functional conference last week that concurrency is required for performance on multicore processors. Since Moore’s Law ended it is certainly true that the old trick of not writing performant code but letting hardware advances pick up the slack has been flagging (although things like SSD have still had their impact).

However equating concurrent code with performance is subtly wrong. If there was a direct relationship then we would have seen concurrent programming adopted swiftly by the games programmers. And yet there we still see an emphasis on ordered, predictable execution, cache structure and algorithmic efficiency.

Performance is one of those vague computing terms, like scale, that has many dimensions. Concurrency has no direct relation to performance as anyone who has managed to write a concurrent program with global resource contention can attest.

There are two relevant axes to considering performance and concurrency: throughput and capacity. Concurrency, through parallelism, allows you to greatly increase your utilisation of the available resources to provide a greater capacity for work.

However that work is not inherently performed faster and may actually result in lowered throughput due to the need to read data that is not in memory and the inability to predict the order of execution.

For things like webservices that are inherently stateless then often concurrency does massively increase performance because the capacity to serve request goes up and there is no need coordinate work. If the webservice is accessing a shared store where essentially all of the key data is in memory and what we need to do is read rather than mutate that data then concurrency becomes even more desirable.

On the other hand, if what we want to do is process work as quickly as possible, i.e. maximise throughput, then concurrency can be a very poor choice.

If we cannot predict the order that work will need to be executed in, due to things like having to distribute work across threads and retry work due to temporary errors then we may have to create the entire context for the work repeatedly and load it into local memory.

In circumstances like these concurrency hurts performance. After all the fastest processing is probably still pointer manipulation of a memory-mapped file, if your want to really fast.

So concurrency means performance and beating Moore’s Law if you can be stateless and value volume of processing over unit throughput.

Standard
Clojure, Java, Programming

Clojure versus Java: Why use Clojure?

I just gave an introductory talk to Clojure and one of the questions after the event was when would a Java programmer might want to switch to using Clojure?

Well Java is a much more complex language then Clojure and requires a lot of expert knowledge to use properly. You really have to know Effective Java pretty well, Java still contains every wrinkle and crease from version 1 onwards. By comparison Clojure is a simpler and more consistent language with less chance of shooting yourself in the foot. However as a new language Clojure does not have the same number of tutorials, faqs and Stack Overflow answers. It also has a different structure to curly brace languages, so it feels quite different to program than Java. If you are proficient Java programmer and you are adept with the standard build tools and IDEs then why would you consider changing?

One example is definitely concurrency, even if you were going to do Java then you’re probably going to let Doug Lea handle the details via java.util.concurrent. However Doug Lea didn’t get to rewrite the whole of Java to be concurrent. In Clojure you are letting Rich Hickey handle your concurrency and the whole language is designed around immutability and sensible ways of sharing state.

Another is implementing algorithms or mathematical programming. A lot of mathematical functions are easy to translate into LISP expressions and Clojure supports variable arity functions for recursion and stackless recursion via the recur function. In Java you either end up using poor-man’s functions via static class methods or modelling the expression of the rule into objects which is a kind of context disconnect.

Similarly data processing and transformation work really well in Clojure (as you might expect of a list processing language!), if you want to take some source of data, read it, normalise it, apply some transform functions, perhaps do some filtering, selection or aggregation then you are going to find a lot of support for the common data functions in Clojure’s sequence library. Only Java 8 has support for lambda function application to collections and even then it has a more complex story that Clojure for chaining those applications together to a stream of data.

You might also find Clojure’s lazy sequences helpful for dealing with streams based on large quantities of data. Totally Lazy offers a port of a lot of Clojure’s functionality to Java but it is often easier to just go direct to the source than try and jury-rig a series of ports together to recreate the same functionality.

A final point to consider is how well your Java code is working out currently, if you are having a lot of problems with memory leaks, full GC and so on then it might be easier to work with Clojure than put in the effort to find what pieces of your code are causing the problems. Not that Clojure is a silver bullet but if you use side-effect free functional programming your problems are going to be limited to the execution path of just the function. In terms of maintaining clear code and reasoning about what is happening at run time you may find it a lot easier. Again this isn’t unique to Clojure but Clojure makes it easier to write good code along these lines.

Standard
Clojure, Programming

Leiningen doesn’t compile Protocols and Records

I don’t generally use records or protocols in my Clojure code so the fact that Clojure compiler doesn’t seem to detect changes in the function bodies of either took me by surprise recently. Googling turned up this issue for Leiningen. Reading through the issue I ended up specifying all the namespaces containing these structures in the :aot definition in the lein project.clj. This meant that the namespace was re-compiled every time but that seemed the lesser of two evils compared to the clean and build approach.

Where this issue really stung was in the method-like function specifications in the records and as usual it felt that structure and behaviour was getting muddled up again when ideally you want to keep them separate.

Standard
Programming

Keeping NPM local

I’m pretty sure that at one point NPM might not have required sudo privileges to do things but by default it now seems that people are sudo installing all over the place. It’s the equivalent of just clicking “yes” on every Windows permission dialog.

There is no particularly good reason to use sudo with a package manager. Virtualenv is tres local by default and even gem (via rbenv and rvm) can now run happily in the user space. Even if you want to share package downloads to minimise network calls then you still don’t need to install everything globally as root.

NPM is special in almost every sense of the word and it’s author recommends that you switch permissions on /usr/local to be owned by your login user. That seems kind of crazy and the kind of thing that probably would only work out for OSX users.

In fact if you are willing to build your own NodeJS it really isn’t hard to bet Node and NPM working locally and still retaining all the advantages of the “global” NPM install. Just use the standard build option of –prefix to set Node to use your home directory and just add $HOME/bin to your PATH in .bashrc or the equivalent.

Then you should be able to use npm without ever having to sudo.

My view though is that you shouldn’t have to force everything into the user space just to make sure a package manager doesn’t need sudo privileges. It would be better for everyone if NPM used a directory in home for each user since it seems to be aimed at single-user installs anyway there is not a massive saving by having packages installed in /usr/local. For the few use-cases where that would be useful then it should be an override option.

Standard
Programming

Cognitive bias and the difficulty of evolving strongly typed solutions

Functional Exchange 2013 featured an interesting talk by Paul Dale that had been mostly gutted but had a helpful introduction to cognitive bias. That reminded me of something I was trying to articulate about in my own talk when I was talking about the difficulty of evolving typed solutions. I used the analogy of the big ball of mud where small incremental changes to the model of the system result in an increasing warped codebase. Ultimately you need someone to come along and rationalise the changes or you end up with something that is ultimately too difficult to work with and is re-built.

This of course is an example of anchoring where the initial type design for the system tends to remain no matter how far the domain and problem have moved from the initial circumstances of their creation.

Redesigning the type definition of a program is also expensive in terms of having to create a new mental model of what is happening rather than just adapting the existing one. Generally it is easier to adapt and incorporate an exception or set of exceptions to the existing model rather than recreating an entire system.

Dynamic or data-driven systems are more sympathetic to the adaptive approach. However they too have their limits, special cases need to be abstracted periodically and holistic data objects can bloat out of control into documents that are dragging the whole world along with them.

Type-based solutions on the other hand need to be periodically re-engineered and the difficulty is that the whole set of type definitions need to be worked on at the same time. Refactoring patterns of object-orientated code often focus on reorganisation that is easy, such as pulling out traits or extracting new structures. This is still anchoring on the original solution though.

If a type system is to be help rather than a hindrance you need to be rework the overall structure. I think with most type-systems this is actually impossible. Hence the pattern of recreating the application if you want to take a different approach.

The best typed solution currently seems to be type unions where you can make substantial changes and abstractions but not have to work about things like type hierarchies or accidentally polluting the edge cases of the system.

Where these aren’t available then good strongly-typed solutions actually rely heavily on good, proactive technical leaders to regularly drive good change through the system and manage the consequences.

Standard
Clojure, Programming, Web Applications

A batteries included Clojure web stack

Inspired by the developer experience of the Play framework as well as that of Django and Ruby on Rails I’ve been giving some thought to what a “batteries included” experience might be for Clojure web development. Unlike things like Pedestal which focuses on trying to keep LISPers happy and writing LISP as much as possible I’m approaching this from the point of view of what would be attractive to frontend developers who choose between things like Rails, Sinatra or Express.

First lets focus on what we already have. Leiningen 2 gives us the ability to create application templates that define the necessary dependencies and directory structures as well as providing an excellent REPL. This should allow us to build a suitable application with a single command. The Compojure plugin already does a lot of the setup necessary to quickstart an application. It downloads dependencies and fires up a server that auto-reloads as the application changes.

The big gap though is that the plugin creates a very bare bones application structure, useful for generating text on the web but not much else. To be able to create a basic (but conventional) web app I think we need to have some standard things like a templating system that works with conventional HTML templates and support for generating and consuming JSON.

Based on my experience and people’s feedback I think it would be worth basing our package on the Mustache templating language via Clostache and using Cheshire to generate and parse the JSON (I like core.data’s lack of dependencies but this is web programming for hackers so we should favour what hackers want to use).

I also think we need to set up some basic static resources within the app like Modernizr and jQuery. A simple, plain skin might also be a good idea unless we can offer a few variations within the plugin such as Bootstrap and Foundation which would be even better.

Supporting a datastore is probably too hard at the moment due to the lack of consensus about what a good allround database is. However I think it would be sensible to offer some instructions as to how to back the app with Postgres, Redis and MongoDB.

I would include Friend by default to make authentication easy and because its difficult to to do that much interesting stuff without introducing some concept of a user. However I think it is important that by default the stack is essentially stateless so authentication needs to be cookie-based by default with an easy way of switching between persistence schemes such as memory and memcache.

Since webapps often spend a lot of time consuming other web services I would include clj-http by default as well. Simple caching that can be backed by memcache also seems important since wrapping Spymemcache is painful and the current Clojure wrappers over it don’t seem to work well with the environment constraints of cloud platforms like Heroku.

A more difficult requirement would be asset pipelining. I think by default the application should be capable of compiling and serving LESS and Coffeescript, with reloading, for development purposes. However ideally during deployment we want to extract all our static resources and output the final compiled versions for serving out of a static handler or alternatively a static resource host. I hate asset fingerprinting due to the ugliness it introduces into urls, I would prefer an ETag solution but fingerprinting is going to work with everything under the sun. I think it should be the default with an option to use ETags as an alternative.

If there was a lein plugin that allowed me to create an application like this with one command I would say that we’re starting to have a credible web development platform.

Standard
Clojure, Programming, Scala

Horses for courses: choosing Scala or Clojure

So one of the questions after my recent talk trying to compare Scala and Clojure (something that I suspect is going to be an ongoing project as I hone the message and the tone) was about whether the languages had problem domains they were more suited too. That’s an interesting question because I think they do and I thought I might be interesting to go through some of the decision making process in a more considered fashion than answering questions after a talk allows you to do.

So some of the obvious applications are that if you want to leverage some Java frameworks and infrastructure then you definitely want to use Scala. Things like JPA, Spring-injection, Hibernate and bean-reflection are a lot easier with Scala; in Clojure you tend to be dancing around the expectations these frameworks have that they are working with concrete bean-like entities.

If you are going to work with concurrency or flexible data formats like CSV and JSON I think you definitely want to be using Clojure. Clojure has good multi-core concurrency that is pretty invisible to you as a programmer. The key thing is avoiding functions with side effects and making sure you update dependent state in a single function (transaction). After that you can rely on the language and its attendant frameworks to provide a lot of powerful concurrency.

Similarly LISP syntax and flexible data go hand in hand so writing powerful data transforms seems second nature because you are using fundamental concepts in the language syntax.

Algorithm and closed-domain problems are interesting. My personal view is that I find recursion easier in Clojure due to things like the explicit recur function and the support for variable-arity function definitions. Clojure’s default lazy sequences also make it easier to explore very large problem spaces. On the other hand if you have problems that can be expressed by state machines or transitions then you might be able to express the solution to a problem very effectively in a Scala case class hierarchy.

When it comes to exploring the capabilities of Java libraries I tend to use the Scala console but for general programming (slide code examples, exploratory programming) I do tend to find myself spending more time in LightTable‘s Instarepl.

When it comes to datastore programming both languages are actually pretty clunky because they devolve handling this down to various third-party libraries. Clojure does pretty well with document databases and key-value stores. Scala is great for interacting with the AWS Java libraries and neither deals particularly well with relational data.

For web programming neither is brilliant but Scala definitely has the edge in terms of mature and full-featured web frameworks. Clojure is definitely more in the log cabin phase of framework support currently.

Standard
Programming, Work

Optimizely testing like a hacker

At work we use Optimizely and I am a fan of the product; I think it has had a massive impact on the way we work and should really help guide us to decide what we choose to do.

However I am not a product manager, user testing expert or statistician (that last part is a lie, I’m a statistician who hasn’t done any stats for seventeen years) I am a dirty hacker programmer and I use Optimizely in a way that probably makes my colleagues weep but which I think actually makes it more valuable as a product. I want to talk about breaking some of the common rules that people put up around this testing.

Note that you need to understand what you’re doing here, I am not recommending this if you are new to the product or multi-variate testing. You also need a good stream of traffic to work on. I do, this is working out for me. One piece of good practice I would keep is: decide how you are going to judge the test before you start it and don’t change your measure once you’ve started. If it is clear your initial metrics aren’t helpful, design a new test. The knowledge you’ve gained is valuable for formulating the right measures.

Don’t change the test once you’ve started it

Only once the test has started can you understand what the problem you are dealing with is and what responses you can take to the issues. If you have a question about what is happening in the test feel free to create a new variation (always with a good name!) and throw it into the mix. I sometimes start with one variation and end the test with nine. It’s better to test immediately than speculate.

Changing a variation (no matter how tempting) is dangerous though as you’ll have to remember the differences and when you applied them. I prefer to spawn variations to changing an in-flight variation. Of course fixing bugs and unintentional consequences is fine. You’re looking at the long term rate not the initial performance.

Don’t change the traffic

I’m not sure this is a general shibboleth but I play around with traffic massively during the test. The great thing about Optimizely is that it takes care of the math so feel free to mix the allocation of traffic freely. If you have a run-away winner early on then don’t be afraid to feed the majority of traffic to it.

Make the test work for the whole audience

I don’t believe in this, make the test work for the easiest audience segment to access. I frequently only test on modern browsers. If you find a trend then shock, horror it often works for the whole audience. It’s about fast feedback not universal truth.

The biggest advantage is that you can use CORS-compliant browsers to do bigger changes to the pages under test.

Don’t change the underlying content

If you take your best performing variation and apply it to the page then the “Original” variation should trend to the variation. If it doesn’t then you know something is up with your measuring. I actually think it is really helpful to make a succession of changes to the base content, based on the tests until the Original variation is performing better than the individual variations.

Once Original is top performing variation you can stop testing the page.

A/A testing has problems

So what? Optimizely has a few issues, you need to deal in big numbers. A/A can be helpful but if you are working in five digit numbers or double-digit percentages then don’t worry about the noise.

Tests have to look good

If your theory is accurate it absolutely does not have to look good. If you are worried that your hypothesis is not working because of the visuals: get over yourself and admit that the idea was weak and you need to rethink it.

I like to start off all variations looking a bit crappy and then seeing whether they can be outperformed by an improved appearance. Often the answer is no; there is a rule of diminishing returns on the appearance of a variation. Things get over-designed on the web all the time. However by trying better looking variations in increments you know exactly how much effort to invest.

Standard
Programming

Refactoring abuse and strong type compiler systems

“Refactoring” is one of the most abused terms in programming. It has a formal meaning but when generally used it tends to mean rewriting or restructuring code (or as I like to refer to it: changing stuff). One interesting new use of refactoring I heard recently was to describe extracting common code. Creating some new codebase is perhaps the opposite of refactoring.

So refactoring tends to mean developers are just changing things they have already written. Real refactoring is of course done to code under test so I was interested in a Stuart Halloway quote about compilation being the weakest form of unit-testing. Scala is used a lot at the Guardian and it has a more powerful type system and compiler than Java which means if you play along with the type system you actually get a lot of that weak unit-testing. In fact structuring your code to maximise the compiler guarantees and adding the various assertion methods to make sure that you fail fast at runtime are two of things that help increase your productivity with Scala.

If you’ve seen the Coursera Scala videos you can see Martin Odersky doing some of this “weak refactoring” in his example code where he simplifies chained collection operations by moving or creating simple functionality in his types.

Of course just like regular refactoring there have to be a few rules to this. Firstly weak refactoring absolutely requires you use explicit function type declarations. Essentially in a weak refactor what you are doing is changing the body of a function while retaining its parameters and return type. If you can still compile after you’ve changed code you are probably good.

However the other critical thing is how much covariance the return type has. A return type of Option for example is probably a bad candidate for weak refactoring as it is probably critical whether your changed code still returns Some or None for a given set of a parameters. Only conventional refactoring can determine whether that is true.

Standard