Programming

Metrics and craftmanship

Ever since we’ve had access to increasingly more comprehensive and easy to comprehend metrics there has been a conflict between the artisan and craftsmanship elements of software development and the data-driven viewpoints.

Things like code quality are seen as being difficult to express in terms of user-affecting metrics. I suspect that is because most of the craft concerns of software development do not affect the overall value of a product. This is not to align myself fully with those driven by metrics.

There are lots of situations where two approaches result in the same metrics outcome. It is tempting to give in to the utilitarian argument that in this case what you should do is simply choose the lowest cost option.

That is too reductionist though and while it may lead to an optimised margin-generating product it feels to me that it is just as likely to create a spiral of compromise that jeopardises the ability to make further improvements.

It is here at the point where metrics are silent that we are put to the test of making good decisions. Our routes forward are neutral from a data point of view but good decisions will unlock better possibilities in the future. It is at this moment that I feel our preferences for things like craft and aesthetic and our understanding of things like cost and consequence matter. Someone able to understand how to achieve beauty and simplicity in software for the same cost as the compromise while achieve very different outcomes.

So we need to be metrics-first so we know we are being honest with ourselves but once we are doing things in a truthful environment, our experience and discretion can make all the difference.

Standard
Programming

Effective dynamic programming: Don’t change variable types

A lot of dynamic languages actually use strong underlying type systems which means that you can’t interchange between strings and number types, you have to explicitly convert.

A typical web example is a http url or parameter that represents a number. All HTTP is string-based but when acting on the value we often want to compare its numeric value against other numbers.

One common technique is to convert the string value to a native type invisibly in the framework but I think that in practice what is more useful is to treat all HTTP content as strings, exactly as it comes off the wire.

Then in the context of the response handler you explicitly convert it to a number when you need to, at the point it needs to be converted.

I think this makes it easier to read the code: parameters passed from the request are always strings. Once the string is bound to an identity that identity should never be rebound to a different type. If we need the value of that identity in a different form we perform an explicit conversion at the site where the different value is needed.

If we use that value more than once then DRY suggests that we create a different identity, say parameter_as_number (unless we have a better name), that also has a consistently typed value, number.

Standard
Programming

Getting to grips with Generators in Javascript

One of the things that makes Python such an amazing language is its far-sighted inclusion of generators and first-order list comprehensions. Both of these things are now coming to Javascript with a healthy sense of homage in the syntax. All the examples in this post have been tested in the console of Firefox 23.

What are generators?

Generators are functions that can be called multiple times and on each call they resume execution from the state they were in when they last stopped executing. To do this they have a special keyword called yield instead of return.

Let’s consider the following trivial example:

function hello() { return "Hello"; }

hello(); // "Hello"

function lazyHello() { yield "Hello"; }

lazyHello(); // [object Generator]

var g = lazyHello(); g.next(); // "Hello"

This example might seem trivial but it actually encapsulates all the important values of generators and lazy evaluation. The difference between hello and lazyHello is that lazyHello defers execution of itself. Instead of executing it returns a Generator object that can then be evaluated for an answer later, rather like Deferreds.

This lazy evaluation means that you describe very complex or expensive calculations but not have to execute them until you need them. Now for a function that returns a single value that isn’t going to seem very exciting.

Writing infinite sequences with Generators

How can describe the world of all the positive integers? Well one way is to take the value of 1 as the start of arithmetic sum and simply add one to the previous value of the sum.

function postiveIntegers() {
  var i = 1;

  while(true) {
    yield i++;
  }
}

var g = postiveIntegers();
g.next(); // 1
g.next(); // 2

When we call next on the Generator we start executing the function, so we initialise the value of the variable and enter the loop. We then call yield and the function stops running. When we call next again we do not run the function again, we resume from the point where we stopped last time, inside the while loop.

Generators provide strong encapsulation of their data since once we have obtained a generator we can no longer access its state, just the results of its calculations. If you wanted to do this sum computation with a regular function then you would need to hold the state of the last call to the function somewhere outside the function and pass it again when you wanted to calculate the next value.

However that infinite loop is still a bit tricky and if you get it wrong then you will lock up the browser. Here’s a safer version.

function postiveIntegers(maxValue) {
  var i = 1;

  while(i <= maxValue) {
    yield i++;
  }

}

var g = positiveIntegers(2);

g.next(); // 1
g.next() // 2
g.next() // [object StopIteration]

Truly infinite sequences are really powerful but in practice you probably want to use weak-sauce versions that have some kind of bound.

Why are generators awesome?

Well firstly they allow the expression of things that would otherwise be impossible in computer code, things like very long or infinite sequences that would not fit in memory if every member had to be expressed or stored.

As part of this lazy evaluation generators are actually very economic in terms of resources, you can define a lot of rules and potential values but then only express the ones you are interested in, minimising the actual allocations.

Finally they are strongly encapsulated functions which hide their internal implementation without the need to resort to object semantics.

Standard
Clojure, Programming

Leiningen doesn’t compile Protocols and Records

I don’t generally use records or protocols in my Clojure code so the fact that Clojure compiler doesn’t seem to detect changes in the function bodies of either took me by surprise recently. Googling turned up this issue for Leiningen. Reading through the issue I ended up specifying all the namespaces containing these structures in the :aot definition in the lein project.clj. This meant that the namespace was re-compiled every time but that seemed the lesser of two evils compared to the clean and build approach.

Where this issue really stung was in the method-like function specifications in the records and as usual it felt that structure and behaviour was getting muddled up again when ideally you want to keep them separate.

Standard
Programming

Cognitive bias and the difficulty of evolving strongly typed solutions

Functional Exchange 2013 featured an interesting talk by Paul Dale that had been mostly gutted but had a helpful introduction to cognitive bias. That reminded me of something I was trying to articulate about in my own talk when I was talking about the difficulty of evolving typed solutions. I used the analogy of the big ball of mud where small incremental changes to the model of the system result in an increasing warped codebase. Ultimately you need someone to come along and rationalise the changes or you end up with something that is ultimately too difficult to work with and is re-built.

This of course is an example of anchoring where the initial type design for the system tends to remain no matter how far the domain and problem have moved from the initial circumstances of their creation.

Redesigning the type definition of a program is also expensive in terms of having to create a new mental model of what is happening rather than just adapting the existing one. Generally it is easier to adapt and incorporate an exception or set of exceptions to the existing model rather than recreating an entire system.

Dynamic or data-driven systems are more sympathetic to the adaptive approach. However they too have their limits, special cases need to be abstracted periodically and holistic data objects can bloat out of control into documents that are dragging the whole world along with them.

Type-based solutions on the other hand need to be periodically re-engineered and the difficulty is that the whole set of type definitions need to be worked on at the same time. Refactoring patterns of object-orientated code often focus on reorganisation that is easy, such as pulling out traits or extracting new structures. This is still anchoring on the original solution though.

If a type system is to be help rather than a hindrance you need to be rework the overall structure. I think with most type-systems this is actually impossible. Hence the pattern of recreating the application if you want to take a different approach.

The best typed solution currently seems to be type unions where you can make substantial changes and abstractions but not have to work about things like type hierarchies or accidentally polluting the edge cases of the system.

Where these aren’t available then good strongly-typed solutions actually rely heavily on good, proactive technical leaders to regularly drive good change through the system and manage the consequences.

Standard
Web Applications, Work

Guardian May 2013 Hackday

You can see the reportage in these two liveblogs: Day 1 and Day 2 (note the terrible naming conventions). The theme of the hackday was “growth”. For the most part I took the theme to mean growth hacking and I did a lot of work along those lines which is difficult to talk publicly about.

However my prior lunchtime hacks had revealed to me that one of the fundamental problems the Guardian has is the volume of content it produces. This is not inherently a bad thing but the key thing to understand is that there is vastly more content than can fit onto what are called “fronts” in the jargon. A front is something like the front page of the site or the Environment section. These fronts produce a lot of traffic to content and for regular readers they are the essential navigation tool for the Guardian’s content.

Therefore I was interested in how we consider the dimension of time and perhaps use it to our advantage to help present content. This aspect of my hackday work is more open because actually I need a lot of help to understand to and because I’ve made some effort to try and use the public Content API rather than our internal content.

I called this work the “Time Trilogy” because it consists of three web apps that each use time as a way of accessing Guardian content.

The three apps are Guardian Word Count which was the original and gives you a sense of the challenge of navigating the content. It is also pretty fun to watch during the day and see the words tick up. So the Word Count spawned TickTickTick and Guardian In Review. TickTickTick is really a daily content explorer and was the first tool I needed to start sorting and exploring the breakdown of what we produce. It is a tool at its heart for exploring the daily news cycle. In Review is slightly different, it takes the one hundred most popular pieces of content over the last seven days and renders it. Initially I wanted it to be a kind of automatically generated magazine but actually looking at what people liked meant that I couldn’t make my initial idea work. People really like videos of meteors and Russian car crashes. What it is now is a way to explore material in the medium term, for content that perhaps has left the news cycle but is still relevant.

Neither app is really finished and the way I work is that I am very reliant on having working software to understand what I am doing and what is wrong or right about my approach. TickTickTick is much closer to being a complete product than In Review and it is providing more insight into the nature of the content being produced. For example there is a massive cluster of material between three and five minutes long.

I am going to continue to work on the apps because they help give me feedback into my work and ultimately these prototypes and toys tend to graduate into working components or theory on the main site itself. I may blog a bit more about them individually as I move them closer to something that genuinely creates value. I’m curious about feedback but acting on it is limited by my aims for the apps and realistically the time I have available.

I also wanted to talk a little bit about how I was working this hack day because I decided to reject advice and work solo rather than part of a team (although I did a little bit of backseat driving on the online magazines product and I did come up with the idea that actually won the hackday (and will hopefully be implemented and awesome)). Working alone does mean that your creations are going to be quite rough but it helps cover a lot of ground, I ended up doing five hacks and working on a total of seven. Working with other people means communicating well whereas solo you just need to express what you want very quickly.

My preferred tool for these kinds of hacks is Python on App Engine, which is what I use for my lunchtime hacks and for which I have a standard application template. With each new application that I do I can start to move the common patterns into the template. To avoid having to faff around with testing I use a loosely functional paradigm that I’ve carried over from Wazoku. It generally works quite well but there are a lot of rules to doing it.

This time around I was doing a bit more frontend work than my day job requires because I was working solo. Again having the startup experience was useful because I was more rediscovering a skillset than learning it. Hacks also means selecting your platform and choosing for optimal output.

For that reason I only targeted Firefox and Chrome (Firefox was actually easier to develop for in terms of standards) and I made liberal use of client-side Less and Coffeescript. I was impressed with how good the error-handling was in both. An obscure bug can wipe out all the productivity gains of a higher-order language but both worked great for me.

On top of that I tried experimenting with the new departmental standard of SMACSS (or at least my cherry-picking of it) and I made a lot of use of both Knockout and Bacon.js.

When I say I made use of SMACSS essentially what I did was namespace my classes to produce simple selectors. This did get me out of a problem I had in In Review so while it is truly the ugliest CSS standard and I suspect in time we may come to hate its rejection of rich functionality I concede that it is effective. Expect to see some of it applied to the main website sometime soon.

Knockout isn’t that popular in the department due to performance issues at a particular level of complexity but for me it did a brilliant job of simply syncing the visual DOM to the data feeds. I was really happy with it, other people were using AngularJS for more dynamic applications but they also had a lot more code than I did and again working solo less is so much more.

Bacon.js was really interesting. A lot of my approach to Javascript is functional and event-based but so far the events have been manually worked via jQuery. Bacon made it easier to create event sources with generic handlers and I probably didn’t use 10% of its full features. I’m curious to see what the rest of the department thinks of it but for my hacks it has definitely earned a place.

It was nice to do something outside the run of normal work and one thing that is quite cool about the hackday is that you can use it to tackle a technology that is entirely new to you and not have to worry about whether you succeed or fail.

Next time (May I believe) I think I want to learn about browser plugins as this is a way of producing better functionality for the Guardian without the hassle of having to make it work for the general population of browsers. Some people’s hacks this time around could have been released to the app/plugin stores and we could have been getting valuable user feedback by now.

Standard
Programming

Refactoring abuse and strong type compiler systems

“Refactoring” is one of the most abused terms in programming. It has a formal meaning but when generally used it tends to mean rewriting or restructuring code (or as I like to refer to it: changing stuff). One interesting new use of refactoring I heard recently was to describe extracting common code. Creating some new codebase is perhaps the opposite of refactoring.

So refactoring tends to mean developers are just changing things they have already written. Real refactoring is of course done to code under test so I was interested in a Stuart Halloway quote about compilation being the weakest form of unit-testing. Scala is used a lot at the Guardian and it has a more powerful type system and compiler than Java which means if you play along with the type system you actually get a lot of that weak unit-testing. In fact structuring your code to maximise the compiler guarantees and adding the various assertion methods to make sure that you fail fast at runtime are two of things that help increase your productivity with Scala.

If you’ve seen the Coursera Scala videos you can see Martin Odersky doing some of this “weak refactoring” in his example code where he simplifies chained collection operations by moving or creating simple functionality in his types.

Of course just like regular refactoring there have to be a few rules to this. Firstly weak refactoring absolutely requires you use explicit function type declarations. Essentially in a weak refactor what you are doing is changing the body of a function while retaining its parameters and return type. If you can still compile after you’ve changed code you are probably good.

However the other critical thing is how much covariance the return type has. A return type of Option for example is probably a bad candidate for weak refactoring as it is probably critical whether your changed code still returns Some or None for a given set of a parameters. Only conventional refactoring can determine whether that is true.

Standard