Programming

Why developers love the shiny

Developers are notorious neophiles who love adopting new technologies and designs. There is a natural bias between interest in working in emerging sectors and a general interest in novelty.

Technology as an industry also emphasises change and novelty as part of its identity with concepts like “disruption” and “innovation” being fundamental to the self-identity of the sector.

I also believe that developers are perversely incentivised to value the new over the established.

Perverse incentives

If you choose to learn an emerging language you have less history to deal with.

If you learn Java you have access to lambdas but you also have to deal with a decades worth of code that was written before lambdas were available. You’ll have to learn about interfaces, with and without implementations; inner classes; anonymous classes; dependency injection and inheritance.

If you choose to learn an emerging technology then chances are you won’t be told to RTFM as it won’t have been written yet.

Instead you’ll have the chance to interact with the technologies creators. If you make a mistake then it will be a learning experience not just for you but for the community as a whole. You’ll be united in your discovery and collaborators instead of an ignorant johnny-come-lately.

And if you learn emerging technologies you will find that conferences are eager to offer you a chance to talk about your experience. Employers will value your knowledge more than the people who added another year of experience with C#.

In every way you will find life easier and more enjoyable if you focus on emerging technology.

Falling out of love

Programmers are often like ardent lovers, in the first flush of their crush they see their new discovery as the answer to all their problems. They are evangelists and poets for their new discovery.

But over time familiarity replaces enchantment and soon we are all too aware of the flaws of tools and take for granted their utility.

Zac Tellman recently put this beautifully in a more general post about open source governance.

When our sense of possibility of what we can do with our tools is exceeded by our understanding of the constraints they have, we start to look elsewhere.

A mature legacy

Legacy is a funny word in technology. As has already been observed most of the time people are delighted to receive a legacy and the word generally has positive conotations.

For developers though the term is loaded with fear as what we are inheriting in the form of a legacy codebase is not a valuable treasure that is being handed down for safekeeping, to be enhanced and handed on to the next generation of stewards.

Instead the legacy codebase is seen as troublesome, an unwanted timebomb that is to be kept in some kind of running order and passed off to someone else as soon as possible.

Maintainers of legacy codebases are also unvalued by organisations. They are often seen as less skilled and less important than developers creating new systems. After all what they are doing is simply keeping a successful system running, they don’t need to use much imagination and they are dealing with solved problems.

Maintaining a legacy codebase often means being overlooked for promotion, pay rises or new opportunities.

In fact this view extends beyond developers, organisations do not tend to value their legacy codebases either. Product managers are equally incentivised to value the new over the old. If they add features to an existing system they are seen as less imaginative than those who create brand new systems or approaches to existing problems.

I have seen product managers deliberately sabotage legacy products so that the performance of the new solution looks better. Management often fails to look at the absolute performance of new system, just the relative numbers. People exploit that tendancy ruthlessly.

Love the new

There are more examples, however they might be better suited for dedicated blog posts in their own right. Overall though I hope I’ve illustrated that how as a profession we are encouraging people to jump on every bandwagon that is passing by.

I want to try and avoid passing a moral judgement on this. I too am a neophile, I too love novelty and would prefer to do something that I haven’t done before over perfecting my skills in a domain I know.

I just want to try and highlight the issue and move it from our unconscious to our conscious minds. Is this what we want for ourselves?

Should we rewarding maintainers of the software that pays our wages less than “rockstars” building prototypes?

Should we value simplification of our existing tools over maintaining backwards compatibility?

These are all inherently cultural issues, not technical ones. Currently we have a culture of novelty and literal innovation. I’m not sure how well it is serving us.

Standard
Programming

Using Google App Engine with Pyenv

I recently started using PyEnv to control my Python installations and make it easier to try to move more of my code to Python 3.

Google App Engine though is unapologetically Python 2.7. Google wants people to move away from the platform in favour of Google Compute custom environments and therefore has little incentive to upgrade the App Engine SDK and environments to support Python 3.

When I set my default Python to be Python 3 with PyEnv I found that despite setting a local version of Python 2.7 my App Engine instance was failing to run with an execfile is not defined exception.

The App Engine Python scripts use #!/usr/bin/env python to invoke the interpreter and for some reason PyEnv doesn’t seem to override the global setting for this despite it being correct when you check it in your shell.

After a lot of frustration and googling for an answer I haven’t found anything elegant. Instead I found this Stack Overflow answer which helpful explained that you can use #!/usr/bin/env/python2 to invoke a specific language interpreter.

Manually changing the shebang line in appcfg.py and dev_appserver.py solved the problem for me and got me running locally.

Obviously this is a pain if I upgrade and I feel it might be better for Google to change the scripts since they don’t have a plan to move to Python3.

Standard
Blogging, Programming

Clojure Exchange 2016

At one point during this year's Clojure Exchange I was reflecting on the numerous problems and setbacks there had been in organising the 2016 exchange with Bruce Durling and he simply replied: "Yeah it was a 2016 type of conference". So that's all I really want to say about the behind the scenes difficulties, despite the struggles I think it was a decent conference.

Personal highlights

James Reeves's talk on asynchronous Ring was an excellent update on how Ring is being adapted to enable asynchronous handlers now and non-blocking handlers in the future. I didn't know that there isn't an equivalent of the Servlet spec for Java NIO-based web frameworks.

The Klipse talk is both short and hilarious with a nicely structured double-act to illustrate the value of being able to evaluate code dynamically on a static page.

David Humphrey's talk, Log all the things was pretty comprehensive on the subject of logging from Clojure applications. It was one of those talks where you felt "well that's been sorted then".

Both Kris's keynote and Christian's Immutable back to front talked not just about the value of Clojure but how you can apply the principles of Clojure's design all across your solution.

One of the most interesting talks was a visualisation of prisoner's dilemma strategies in the browser. It was visual, experimental and informative.

Henry Garner's data science on Clojure talk was interesting again with some nice dynamic distributions and discussions of multi-arm bandit dynamic analysis. Sometimes I feel lots of the data science stuff is too esoteric with too little tangible output. This talk felt a little more relatable in terms of making dynamic variant testing less painful.

Disappointments

Not everything sings on the day. Daan van Berkel's talk on Rubik's Cubes suffered a technical failure that meant his presentation was not dynamically evaluating and therefore became very hard to follow. We should have tried to switch talks around or take a break and try and fix it.

The AV was a general rumbling problem with a few speakers having to have a mic switch in the middle of their talks.

Hans Hubner's talk on persistence was interesting but too quick and too subtle.

We should have had the two Spec talks closer together and earlier in the day. The things that people are doing with it are non-trivial and it is still a relatively new thing.

clojure.spec

Spec is kind of interesting generally for the community. It has become very popular, very quickly and it is being used for all kinds of things.

One theme that came up in the conference was the idea that people wanted to share their spec definitions across the codebase. This seems a bad idea and a classic example of overreach, if someone said they defined all their domain classes in a single Java jar and shared it all across the company then you'd probably thing that is a bad idea. It's not better here because it is Clojure.

The use of Spec was also kind of interesting from a community point of view as the heaviest users of Clojure seemed to be doing the most with it. The bigger the team and the codebase the quicker people have been to adopt Spec and in some cases seem to switch from using Schema to Spec.

On the other hand the people using Clojure for data processing, web programming and things like Clojurescript have not really adopted Spec, probably because it simply doesn't add a lot of benefit for them.

So for the first time in a while we have something that requires some introduction for those new and unfamiliar with it but is being used in really esoteric ways by those making the most use of it. There is a quite a big gap between the two parts of the community.

The corridor track

Out of the UK conferences I went to Clojure Exchange felt like it had the best social pooling of knowledge outside of Scale Summit. Maybe it was because I knew more people here but the talks also had all kinds of interesting little tips. For example during Christian's talk he mentioned that S3 and Cloudfront make for one of the most reliable web API deployment platforms you can choose to use. I ended up making a huge list of links of reminders and things to follow up on. I've also included links to lots of the Github repos that were referenced during the talks.

Next year

And so with a certain inevitability we are looking to the next Clojure Exchange. We're going to have a slightly bigger program committee which should make things easier.

The other thing that we didn't really do that well this year was to try and have some talks transfer from the community talk tracks to the event. In 2017 we'll hopefully be more organised around the community and also have a series of talks that are tied in to the conference itself. If you're interested in being involved in either the organising or the talks you can get involved via London Clojurians.

See you there!

Standard
Programming, Web Applications, Work

Why can’t Forms PUT?

HTML Forms can declare a method, the HTTP verb that is used when the form is submitted, the value of this method is GET or POST.

The HTML5 spec briefly had PUT and DELETE as valid methods for the form method but has now removed them. Firefox also added support and subsequently removed them.

Recently over the course of Brexit night at The Guardian we got into a discussion about why this was the case and what the “right” way to map a form into a REST-like resource system would be.

The first piece of research was to dig into why the additional methods had been added and then removed. The answer (via Ian Hickson) was simple: PUT and DELETE have implied idempotency, the nature of form submission is that it is inherently uncacheable and therefore cannot be properly mapped onto those verbs.

So, basic problem solved, it also implies the solution for the url design for a form. A form submission represents a user submitting an untrusted data payload to a resource, this resource in turn choose to make PUT or DELETE requests but it would be dangerous to have the form do this directly.

The resource therefore is one that represents the form submission. In terms of modelling the URL I would be tempted to say that it takes the form :entity/form/submission, so for example: contact/form/submission.

There may be an argument that POSTing to the form resource represents submission so the submission part of the structure is unnecessary. In my imagination though the form resource itself represents the metadata of the form while the submission is the resource that essentially models a valid sumbission and the resource that represents the outcome of the submission.

Standard
Programming, Web Applications

AngularJS migration: PhantomJS and Angular Mocks

I have recently been upgrading a project from Angular 1.3 to 1.5 in an attempt to get the majority of our projects to a state where a migration to Angular 2 might be more likely.

The upgrade from 1.4 to 1.5 was for the most part entirely painless as the migration notes had promised. The application built and ran and none of our code seemed to be relying on any of the breaking behaviour between the versions.

There was just one problem, all our tests were failing. All the mocks were coming back as undefined with an obscure error url that didn’t really help as the advice it gave was about implementing a provider which applied to none of the mock setup that was happening in the code.

It took a bit of Googling around the problem (and hence this blog post to try and improve the situation) to find a related issue in Github that finally clued me off to the solution that we needed to update the Karma PhantomJS runner and more crucially the version of PhantomJS we were using.

As far as I can tell switching Karma to use PhantomJS 2 is a good idea irrespective of what version of Angular you are using so I think it would probably sensible to do this before you start updating Angular itself.

Standard
Programming

CloudFormation fails to create specified user

I recently had a problem with some historic CloudFormation where the user and their home directory was not being created. The problem and the solution were not complicated but my Google search returned nothing directly related to the problem except the AWS docs on the configuration syntax and there were no errors in the init log.

The user and their associated home directory were not being created which then meant when the scripting in UserData ran (which relied on a certain directory structure) I was getting a “directory not found” error.

The problem and solution are ridiculously straight-forward. The cfn-init scripts were not being installed or run. Without them configuration data in Metadata is not run, which is what the documentation in AWS::CloudFormation::Init pretty much says. I adapted this gist to install cfn-init and everything sprang into life again.

The reason I struggled so much with the problem was that I was modifying existing CloudFormation that had generated a successfully running application previous to my changes.

It took me hours to figure out that while the current version of the CloudFormation made no mention of cfn-init and yet apparently worked was simply because the necessary changes had not been checked into the source repository. Without a simple way to go back and review the actual CloudFormation config that was used (hopefully something that might change in a future version of CloudFormation) I assumed that what was missing was my knowledge and there was some other way of getting the Metadata to execute.

 

Standard
Clojure

Data wrangling with Clojure

Clojure is a great language for wrangling data that is either awkwardly-sized or where data needs to be drawn from and stored in different locations.

What does awkward-sized data mean?

I am going to attribute the term “awkward-sized data” to Henry Garner and Bruce Durling. Awkward-sized data is neither big data nor small data and to avoid defining something by what it is not I would define it as bigger than would fit comfortably into a spreadsheet and irregular enough that it is not easy to map onto a relational schema.

It is about hundreds of thousands of data points and not millions, it is data sets that fit into the memory on a reasonably specified laptop.

It also means data where you need to reconcile data between multiple datastores, something that is more common in a microservice or scalable service world where monolithic data is distributed between more systems.

What makes Clojure a good fit for the problem?

Clojure picks up a lot of good data processing traits from its inheritance as a LISP. A LISP after all is a “list processor”, the fundamental structures of the language are data and its key functionality is parsing and processing those data structures into operations. You can serialise data structures to a flat-file and back into memory purely through the reader macro and without the need for parsing libraries.

Clojure has great immutable data structures with great performance, a robust set of data processing functions in its core library, along with parallel execution versions, it has well-defined transactions on data. It is, unusually, lazy be default which means it can do powerful calculations with a minimal amount of memory usage. It has a lot of great community libraries written and also Java compatibility if you want to use an existing Java library.

Clojure also has an awesome REPL which means you have a powerful way of directly interacting with your data and getting immediate feedback on the work you are doing.

Why not use a DSL or a specify datastore?

I will leave the argument as to why you need a general purpose programming language to Tommy Hall, his talk about cloud infrastructure DSLs is equally relevant here. There are things you reasonably want to do and you can either add them all to a DSL until it has every feature of poorly thought-out programming language or you can start directly with the programming language.

For me the key thing that I always want to do is read or write data, either from a datastore, file or HTTP/JSON API. I haven’t come across a single data DSL that makes it easier to read from one datastore and write to another.

Where can I find out more?

If you are interested in statistical analysis a good place to start is Bruce Durling’s talk on Incanter which he gave relatively early in his use of it.

Henry Garner’s talk Expressive Parallel Analytics with Clojure has a name that might scare the hell out of you but, trust me, this is actually a pretty good step-by-step guide to how you do data transformations and aggregations in Clojure and then make them run in parallel to improve performance.

Libraries I like

In my own work I lean on the following libraries a lot.

JSON is the lingua franca of computing and you are going to need a decent JSON parser and serialiser, I like Cheshire because it does everything I need, which is primarily produce sensible native data structures that are as close to native JSON structures as possible.

After JSON the other thing that I always need is access to HTTP. When you are mucking around with dirty data the biggest thing I’ve found frustrating are libraries that throw exceptions whenever you get something other than a status code of 200. clj-http is immensely powerful but you will want to switch off exceptions. clj-http-lite only uses what is in the JDK so makes for easier dependencies, you need to switch off exceptions again. Most of the time the lite library is perfectly usable, if you are just using well-behaved public APIs I would not bother with anything more complicated. For an asynchronous client there is http-kit, if you want to make simultaneous requests async can be a great choice but most of the time it adds a level of complexity and indirection that I don’t think you need. You don’t need to worry about exceptions but do remember to add a basic error handler to avoid debugging heartache.

For SQL I love yesql because it doesn’t do crazy things and instead lets you write and test normal SQL and then use inside Clojure programs. In my experience this is what you want to do 100% of the time and not use some weird abstraction layer. While I will admit to being lazy and frequently loading the queries into the default namespace it is far more sensible to load them via the require-sql syntax.

One thing I have had to do a bit of is parsing and cleaning HTML and I love the library Hickory for this. One of the nice things is that because it produces a standard Clojure map for the content you can use a lot of completely vanilla Clojure techniques to do interesting things with the content.

Example projects

I created a simple film data API that reads content from an Oracle database and simply publishes it as a JSON. This use Yesql and is really just a trivial data transform that makes the underlying data much more usable by other consumers.

id-to-url is a straight-forward piece of data munging but requires internal tier access to the Guardian Content API. Given a bunch of internal id numbers from an Oracle databases we need to check the publication status of the content and then extract the public url for the content and ultimately in the REPL I write the URLs to a flat file.

Asynchronous and Parallel processing

My work has generally been IO-bound so I haven’t really needed to use much parallel processing.

However if you need it then Rich Hickey does the best explanation of reducers and why reduce is the only function you really need in data processing. For transducers (in Clojure core from 1.7) I like Kyle Kingsbury’s talk a lot and he talks about Tesser which seems to be the ultimate library for multicore processing.

For async work Rich, again, does the best explanation of core.async. For IO async ironically is probably the best approach for making the most of your resources but I haven’t yet been in a situation where

Standard