Programming

Effective dynamic programming: Don’t change variable types

A lot of dynamic languages actually use strong underlying type systems which means that you can’t interchange between strings and number types, you have to explicitly convert.

A typical web example is a http url or parameter that represents a number. All HTTP is string-based but when acting on the value we often want to compare its numeric value against other numbers.

One common technique is to convert the string value to a native type invisibly in the framework but I think that in practice what is more useful is to treat all HTTP content as strings, exactly as it comes off the wire.

Then in the context of the response handler you explicitly convert it to a number when you need to, at the point it needs to be converted.

I think this makes it easier to read the code: parameters passed from the request are always strings. Once the string is bound to an identity that identity should never be rebound to a different type. If we need the value of that identity in a different form we perform an explicit conversion at the site where the different value is needed.

If we use that value more than once then DRY suggests that we create a different identity, say parameter_as_number (unless we have a better name), that also has a consistently typed value, number.

culture

The Fifth Estate

Courtesy of a preview screening for Guardian staff I got to see the Fifth Estate last night. The biggest point of reference is The Social Network with the script focussing on the relationship between Julian Assange and Daniel Berg. However whereas the Social Network focusses on friendship, loneliness and betrayal this film focuses on ambition, idealism and inspiration.

The film depicts Berg as someone in search of a purpose who finds a leader and a cause in Assange. The rest of the script plays out the consequences of success and the interplay of loyalty, obedience and belief in a radical political organisation.

The film tritely suggests that Assange’s childhood experience of being part of a cult and later trial for hacking (where his fellow defendants testified against him) is mirrored in how he structures WikiLeaks with Assange as the undisputed leader of an organisation centred unquestioningly around him.

The better parts of the film do a more nuanced job of representing how the strength of personality required to change the world in a radical, political way also manifest in the personality flaws of paranoia and arrogance.

All radical political movements are charismatic, disruptive and unstable. The Fifth Estate tries to contrast political achievement with personal cost but it feels laboured with the visual metaphors bludgeoning and the dialogue clunking.

There is also a massive problem in that “computer stuff” is just visually difficult to portray, like theoretical physics or philosophy. It’s primarily the internal workings of thought.

Really the most enjoyable part of the film are the central performances of Benedict Cumberbach and Daniel Brühl. Some of the photography is pretty good as well.

The script uses some heavy-handed techniques for showing that the leaks were not “victimless” but actually affected real people in difficult situations. The disconnection between actions and consequences for cyber-activists was worth addressing.

As for the depiction of the Guardian. Well naturally anything you know about always seems to be travestied when outsiders write about it and this is no different. Most of the journalism stuff seems clichéd. The film doesn’t capture any of the real debate about the nature of “citizen journalism” within the Guardian or the wider world of commercial journalism.

More weirdly though the presence of the Guardian feels irrelevant to the central themes of the film and therefore tends to drag on the plot. However in the real world the involvement of profession organisations was actually vital for turning the raw data into some explicable to a wider audience. That aspect of the story is some what glossed over or under-explained.

Programming

Getting to grips with Generators in Javascript

One of the things that makes Python such an amazing language is its far-sighted inclusion of generators and first-order list comprehensions. Both of these things are now coming to Javascript with a healthy sense of homage in the syntax. All the examples in this post have been tested in the console of Firefox 23.

What are generators?

Generators are functions that can be called multiple times and on each call they resume execution from the state they were in when they last stopped executing. To do this they have a special keyword called yield instead of return.

Let’s consider the following trivial example:

function hello() { return "Hello"; }

hello(); // "Hello"

function lazyHello() { yield "Hello"; }

lazyHello(); // [object Generator]

var g = lazyHello(); g.next(); // "Hello"

This example might seem trivial but it actually encapsulates all the important values of generators and lazy evaluation. The difference between hello and lazyHello is that lazyHello defers execution of itself. Instead of executing it returns a Generator object that can then be evaluated for an answer later, rather like Deferreds.

This lazy evaluation means that you describe very complex or expensive calculations but not have to execute them until you need them. Now for a function that returns a single value that isn’t going to seem very exciting.

Writing infinite sequences with Generators

How can describe the world of all the positive integers? Well one way is to take the value of 1 as the start of arithmetic sum and simply add one to the previous value of the sum.

function postiveIntegers() {
  var i = 1;

  while(true) {
    yield i++;
  }
}

var g = postiveIntegers();
g.next(); // 1
g.next(); // 2

When we call next on the Generator we start executing the function, so we initialise the value of the variable and enter the loop. We then call yield and the function stops running. When we call next again we do not run the function again, we resume from the point where we stopped last time, inside the while loop.

Generators provide strong encapsulation of their data since once we have obtained a generator we can no longer access its state, just the results of its calculations. If you wanted to do this sum computation with a regular function then you would need to hold the state of the last call to the function somewhere outside the function and pass it again when you wanted to calculate the next value.

However that infinite loop is still a bit tricky and if you get it wrong then you will lock up the browser. Here’s a safer version.

function postiveIntegers(maxValue) {
  var i = 1;

  while(i <= maxValue) {
    yield i++;
  }

}

var g = positiveIntegers(2);

g.next(); // 1
g.next() // 2
g.next() // [object StopIteration]

Truly infinite sequences are really powerful but in practice you probably want to use weak-sauce versions that have some kind of bound.

Why are generators awesome?

Well firstly they allow the expression of things that would otherwise be impossible in computer code, things like very long or infinite sequences that would not fit in memory if every member had to be expressed or stored.

As part of this lazy evaluation generators are actually very economic in terms of resources, you can define a lot of rules and potential values but then only express the ones you are interested in, minimising the actual allocations.

Finally they are strongly encapsulated functions which hide their internal implementation without the need to resort to object semantics.

Gadgets

Using Linux and WD’s My Box Live

For a while I’ve had a hankering to be able to share content (mostly music) between my various laptops via a network drive, mostly to avoid having to either attach a drive or waste laptop SSD space. A cursory look through Amazon gave me Western Digital’s My Book Live, which seemed compatible with Ubuntu while mostly being presented as an OSX/Windows product. However the official line is actually “if it works with Linux great, if not we don’t support it”.

Actually getting things working was harder than I had anticipated, if I had a Ethernet connected computer switched on then the drive appeared normally however as soon as I switched off the computer then the drive disappeared as well, presumably meaning that the drive was being shared via mesh networking rather than being available to the Wifi devices as a first-class network citizen.

Some online comments suggested that the issue was that the device was not using a static IP, so I went into the settings and changed that. While in Static IP mode the drive started to give a warning it wasn’t connected to the internet, which presumably is something to do with port forwarding for the WD2GO service which also requires some router config. Despite this the drive was available once the static IP binding was done. However any music player (Rhythmbox and Banshee) that tried to connect to the drive failed to connect and there didn’t seem to be a way to provide the required anonymous login.

The final stretch was helped by this post about mounting network drives, on mounting the drive manually it was possible to access the new drive and generate playlists for the new drive. I didn’t want to edit fstab for this so I’m think of creating aliases for the mounting and unmounting operations.

I am now able to share my music collection to my Ubuntu laptop but it has not been a simple experience and I do think WD are short-sighted in not making the operation smoother. Linux may not be a massive market but it doesn’t seem that complex to support it better, if nothing else in the FAQ for the product.

Programming

Concurrency means performance, yes?

One thing I heard a lot at the Mostly Functional conference last week that concurrency is required for performance on multicore processors. Since Moore’s Law ended it is certainly true that the old trick of not writing performant code but letting hardware advances pick up the slack has been flagging (although things like SSD have still had their impact).

However equating concurrent code with performance is subtly wrong. If there was a direct relationship then we would have seen concurrent programming adopted swiftly by the games programmers. And yet there we still see an emphasis on ordered, predictable execution, cache structure and algorithmic efficiency.

Performance is one of those vague computing terms, like scale, that has many dimensions. Concurrency has no direct relation to performance as anyone who has managed to write a concurrent program with global resource contention can attest.

There are two relevant axes to considering performance and concurrency: throughput and capacity. Concurrency, through parallelism, allows you to greatly increase your utilisation of the available resources to provide a greater capacity for work.

However that work is not inherently performed faster and may actually result in lowered throughput due to the need to read data that is not in memory and the inability to predict the order of execution.

For things like webservices that are inherently stateless then often concurrency does massively increase performance because the capacity to serve request goes up and there is no need coordinate work. If the webservice is accessing a shared store where essentially all of the key data is in memory and what we need to do is read rather than mutate that data then concurrency becomes even more desirable.

On the other hand, if what we want to do is process work as quickly as possible, i.e. maximise throughput, then concurrency can be a very poor choice.

If we cannot predict the order that work will need to be executed in, due to things like having to distribute work across threads and retry work due to temporary errors then we may have to create the entire context for the work repeatedly and load it into local memory.

In circumstances like these concurrency hurts performance. After all the fastest processing is probably still pointer manipulation of a memory-mapped file, if your want to really fast.

So concurrency means performance and beating Moore’s Law if you can be stateless and value volume of processing over unit throughput.

Software

Switching Nvidia drivers from the command-line

The Steam client informed me today that there were more recent Nvidia drivers for Ubuntu available and that I should upgrade for stability, etc. etc. It seemed a fairly innocuous change compared to the beta drivers I was using so I pressed the button and then restarted. Resulting in a failure to boot Unity and a chance to rediscover the joy of the command-line. I don’t know why Unity and the new drivers fail to mix to spectacularly however the simplest thing to do seemed to be to revert to the earlier drivers.

The problem with doing that is that I’ve only ever done it via the gui tools. This AskUbuntu answer told me about Jockey the software that underpins the proprietary driver contol. Running Jockey at the command-line was very, very slow but it did indeed allow me to select the earlier drivers and after a restart the GUI was booting again. Much easier than hand-editing an X config file.

Work

Agile: are scrummasters the masters?

One of the fault lines in modern Agile development remains the purpose and application of process. For me the fundamental conflict between a developer and a “scrummaster” is to do with what the main purpose of that role is. Scrummasters often profess a servant manager role for themselves while actually enacting a traditional master hierarchical function.

The following is the acid test for me. The servant manager is one who takes the work I am doing and expresses it in a form that allows people outside the team to understand what I am doing, the progress I have made on it and make predictions about when my work will be complete.

The traditional manager instead tries to control my work so that it fits neatly into the reporting tools that they want to use. They don’t hesitate to interfere, manipulate and control to make their life easier with their own superiors.

Calling yourself a servant manager but then telling people how to structure their work is paying lipservice to a popular slogan while continuing a strand of managerial behaviour that has been proven to fail for decades.

Clojure, Java, Programming

Clojure versus Java: Why use Clojure?

I just gave an introductory talk to Clojure and one of the questions after the event was when would a Java programmer might want to switch to using Clojure?

Well Java is a much more complex language then Clojure and requires a lot of expert knowledge to use properly. You really have to know Effective Java pretty well, Java still contains every wrinkle and crease from version 1 onwards. By comparison Clojure is a simpler and more consistent language with less chance of shooting yourself in the foot. However as a new language Clojure does not have the same number of tutorials, faqs and Stack Overflow answers. It also has a different structure to curly brace languages, so it feels quite different to program than Java. If you are proficient Java programmer and you are adept with the standard build tools and IDEs then why would you consider changing?

One example is definitely concurrency, even if you were going to do Java then you’re probably going to let Doug Lea handle the details via java.util.concurrent. However Doug Lea didn’t get to rewrite the whole of Java to be concurrent. In Clojure you are letting Rich Hickey handle your concurrency and the whole language is designed around immutability and sensible ways of sharing state.

Another is implementing algorithms or mathematical programming. A lot of mathematical functions are easy to translate into LISP expressions and Clojure supports variable arity functions for recursion and stackless recursion via the recur function. In Java you either end up using poor-man’s functions via static class methods or modelling the expression of the rule into objects which is a kind of context disconnect.

Similarly data processing and transformation work really well in Clojure (as you might expect of a list processing language!), if you want to take some source of data, read it, normalise it, apply some transform functions, perhaps do some filtering, selection or aggregation then you are going to find a lot of support for the common data functions in Clojure’s sequence library. Only Java 8 has support for lambda function application to collections and even then it has a more complex story that Clojure for chaining those applications together to a stream of data.

You might also find Clojure’s lazy sequences helpful for dealing with streams based on large quantities of data. Totally Lazy offers a port of a lot of Clojure’s functionality to Java but it is often easier to just go direct to the source than try and jury-rig a series of ports together to recreate the same functionality.

A final point to consider is how well your Java code is working out currently, if you are having a lot of problems with memory leaks, full GC and so on then it might be easier to work with Clojure than put in the effort to find what pieces of your code are causing the problems. Not that Clojure is a silver bullet but if you use side-effect free functional programming your problems are going to be limited to the execution path of just the function. In terms of maintaining clear code and reasoning about what is happening at run time you may find it a lot easier. Again this isn’t unique to Clojure but Clojure makes it easier to write good code along these lines.

Web Applications

Thanks for asking my name Typeform

I signed up for the new online form service Typeform this weekend and noticed a very simple solution to a contentious problem.

There you go, two simple questions that sort out what your “proper” legal name is and what you’d like to be called when machines talk to you.

Names are really incredibly complicated pieces of information to collect due to the massive variety of conventions. Instead of this solution I have ended up in the past asking for first name and last name, given name and surname or even approving a hacky attempt to extract a name from an email based on the location of periods.

The issue is that from a marketing perspective you want to personalise the interface and also try and make generic sales emails work better by using a person’s name. However in terms of sign up pages, ux often states that the longer the form the less likely people are to complete, which is an issue when your starting your business.

Just like trying to stuff gender and honorific into dropdown lists names are really hard and if personalisation matters to you it is worth taking the risk to ask people how they would prefer to be named.

Web Applications

Reviving RSS

Google’s announcement of the end of Reader created all kinds of interesting consequences. It gave a sense of the scale the Google now prefers to operate at. As people migrated away from Reader they were literally bringing alternative services down with the volume of demand being created.

For me personally it made me think about RSS for the first time in quite a while, I have a Reader account and the accompanying Google app but in reality I only really looked at it when I was bored. Given all the excitement and information flying around about alternative products I thought I would have a look at what was on offer.

The two I seriously kicked the tires on were Skimr and NewsBlur, I also looked at feedly but as I am more mobile web than mobile apps I wasn’t that taken with the pitch. I was also swayed by a NewsBlur blog that pointed out that moving from freemium to freemium wasn’t exactly solving the problem whereas an open source subscription model was more likely to avoid history repeating itself. Skimr was an interesting experiment and for things like Reddit and Hacker News where there isn’t really any body to the posts it was as good as any other alternative. However I realised that for blogs and news sites I didn’t really want to read a summary, particularly as news sites frequently truncate the content in the RSS feed anyway.

NewsBlur seems heavy on the client-side and has put its hands up to scaling issues but initially it was clunky and slow. I dared not run it on any other browser than Chrome due to its pig-like hogging of the browser resources. However things have got better and the extremely rich interface has become more bearable although there are still fundamental annoyances like hijacking right-click. Initial features that I didn’t like very much, such as site previewing, are actually useful in practice and the product feels like it is going somewhere.

The most interesting thing about the exercise was actually re-engaging with RSS generally. I had been relying on things like skimming Twitter and Reddit to catch up on all the key issues, it works and it isn’t a bad strategy for dealing with information overload. However as I started to subscribe to blogs from friends or even on the basis of enjoying a piece recommended socially I started to enjoy that feeling of spontaneity, it turned out that my friends were posting more than I thought and that in some areas such as science posting rates are slow but the quality is high so subscribing was a sensible way of catching up on them.

Some sites also turned out to be doing a terrible job of presenting their content and RSS actually revealed more pieces that I was interested in, take Review31 whose feed is interesting and also very different to their front page (not intentionally I would imagine).

In terms of the value of a newsfeed I realised that I should have implemented an RSS feeds (global and per user) for Wazoku’s Idea Spotlight product. At the time I was obsessed with the fact that as an app requiring authentication there wasn’t a good fit between the idea of a public feed of data and a closed private app. In retrospect I should have seen RSS as a robust way of capturing an activity feed and allowing a user to browse it. As a machine-parsable format it would have made it easy to generate catchup pages. It is kind of irrelevant whether the feed is public or not. It feels good to see this sudden rebirth of interest and activity in RSS and shows that often change is something we need rather than want.

Echo One

Sequentially arranged sentences composed of words (and punctuation)

Effective dynamic programming: Don’t change variable types

The Fifth Estate

Getting to grips with Generators in Javascript

What are generators?

Writing infinite sequences with Generators

Why are generators awesome?

Using Linux and WD’s My Box Live

Concurrency means performance, yes?

Switching Nvidia drivers from the command-line

Agile: are scrummasters the masters?

Clojure versus Java: Why use Clojure?

Reviving RSS