Clojure, Programming

Leiningen doesn’t compile Protocols and Records

I don’t generally use records or protocols in my Clojure code so the fact that Clojure compiler doesn’t seem to detect changes in the function bodies of either took me by surprise recently. Googling turned up this issue for Leiningen. Reading through the issue I ended up specifying all the namespaces containing these structures in the :aot definition in the lein project.clj. This meant that the namespace was re-compiled every time but that seemed the lesser of two evils compared to the clean and build approach.

Where this issue really stung was in the method-like function specifications in the records and as usual it felt that structure and behaviour was getting muddled up again when ideally you want to keep them separate.

Standard
Programming

Keeping NPM local

I’m pretty sure that at one point NPM might not have required sudo privileges to do things but by default it now seems that people are sudo installing all over the place. It’s the equivalent of just clicking “yes” on every Windows permission dialog.

There is no particularly good reason to use sudo with a package manager. Virtualenv is tres local by default and even gem (via rbenv and rvm) can now run happily in the user space. Even if you want to share package downloads to minimise network calls then you still don’t need to install everything globally as root.

NPM is special in almost every sense of the word and it’s author recommends that you switch permissions on /usr/local to be owned by your login user. That seems kind of crazy and the kind of thing that probably would only work out for OSX users.

In fact if you are willing to build your own NodeJS it really isn’t hard to bet Node and NPM working locally and still retaining all the advantages of the “global” NPM install. Just use the standard build option of –prefix to set Node to use your home directory and just add $HOME/bin to your PATH in .bashrc or the equivalent.

Then you should be able to use npm without ever having to sudo.

My view though is that you shouldn’t have to force everything into the user space just to make sure a package manager doesn’t need sudo privileges. It would be better for everyone if NPM used a directory in home for each user since it seems to be aimed at single-user installs anyway there is not a massive saving by having packages installed in /usr/local. For the few use-cases where that would be useful then it should be an override option.

Standard
Programming

Cognitive bias and the difficulty of evolving strongly typed solutions

Functional Exchange 2013 featured an interesting talk by Paul Dale that had been mostly gutted but had a helpful introduction to cognitive bias. That reminded me of something I was trying to articulate about in my own talk when I was talking about the difficulty of evolving typed solutions. I used the analogy of the big ball of mud where small incremental changes to the model of the system result in an increasing warped codebase. Ultimately you need someone to come along and rationalise the changes or you end up with something that is ultimately too difficult to work with and is re-built.

This of course is an example of anchoring where the initial type design for the system tends to remain no matter how far the domain and problem have moved from the initial circumstances of their creation.

Redesigning the type definition of a program is also expensive in terms of having to create a new mental model of what is happening rather than just adapting the existing one. Generally it is easier to adapt and incorporate an exception or set of exceptions to the existing model rather than recreating an entire system.

Dynamic or data-driven systems are more sympathetic to the adaptive approach. However they too have their limits, special cases need to be abstracted periodically and holistic data objects can bloat out of control into documents that are dragging the whole world along with them.

Type-based solutions on the other hand need to be periodically re-engineered and the difficulty is that the whole set of type definitions need to be worked on at the same time. Refactoring patterns of object-orientated code often focus on reorganisation that is easy, such as pulling out traits or extracting new structures. This is still anchoring on the original solution though.

If a type system is to be help rather than a hindrance you need to be rework the overall structure. I think with most type-systems this is actually impossible. Hence the pattern of recreating the application if you want to take a different approach.

The best typed solution currently seems to be type unions where you can make substantial changes and abstractions but not have to work about things like type hierarchies or accidentally polluting the edge cases of the system.

Where these aren’t available then good strongly-typed solutions actually rely heavily on good, proactive technical leaders to regularly drive good change through the system and manage the consequences.

Standard
Clojure, Programming, Web Applications

A batteries included Clojure web stack

Inspired by the developer experience of the Play framework as well as that of Django and Ruby on Rails I’ve been giving some thought to what a “batteries included” experience might be for Clojure web development. Unlike things like Pedestal which focuses on trying to keep LISPers happy and writing LISP as much as possible I’m approaching this from the point of view of what would be attractive to frontend developers who choose between things like Rails, Sinatra or Express.

First lets focus on what we already have. Leiningen 2 gives us the ability to create application templates that define the necessary dependencies and directory structures as well as providing an excellent REPL. This should allow us to build a suitable application with a single command. The Compojure plugin already does a lot of the setup necessary to quickstart an application. It downloads dependencies and fires up a server that auto-reloads as the application changes.

The big gap though is that the plugin creates a very bare bones application structure, useful for generating text on the web but not much else. To be able to create a basic (but conventional) web app I think we need to have some standard things like a templating system that works with conventional HTML templates and support for generating and consuming JSON.

Based on my experience and people’s feedback I think it would be worth basing our package on the Mustache templating language via Clostache and using Cheshire to generate and parse the JSON (I like core.data’s lack of dependencies but this is web programming for hackers so we should favour what hackers want to use).

I also think we need to set up some basic static resources within the app like Modernizr and jQuery. A simple, plain skin might also be a good idea unless we can offer a few variations within the plugin such as Bootstrap and Foundation which would be even better.

Supporting a datastore is probably too hard at the moment due to the lack of consensus about what a good allround database is. However I think it would be sensible to offer some instructions as to how to back the app with Postgres, Redis and MongoDB.

I would include Friend by default to make authentication easy and because its difficult to to do that much interesting stuff without introducing some concept of a user. However I think it is important that by default the stack is essentially stateless so authentication needs to be cookie-based by default with an easy way of switching between persistence schemes such as memory and memcache.

Since webapps often spend a lot of time consuming other web services I would include clj-http by default as well. Simple caching that can be backed by memcache also seems important since wrapping Spymemcache is painful and the current Clojure wrappers over it don’t seem to work well with the environment constraints of cloud platforms like Heroku.

A more difficult requirement would be asset pipelining. I think by default the application should be capable of compiling and serving LESS and Coffeescript, with reloading, for development purposes. However ideally during deployment we want to extract all our static resources and output the final compiled versions for serving out of a static handler or alternatively a static resource host. I hate asset fingerprinting due to the ugliness it introduces into urls, I would prefer an ETag solution but fingerprinting is going to work with everything under the sun. I think it should be the default with an option to use ETags as an alternative.

If there was a lein plugin that allowed me to create an application like this with one command I would say that we’re starting to have a credible web development platform.

Standard
Clojure, Programming, Scala

Horses for courses: choosing Scala or Clojure

So one of the questions after my recent talk trying to compare Scala and Clojure (something that I suspect is going to be an ongoing project as I hone the message and the tone) was about whether the languages had problem domains they were more suited too. That’s an interesting question because I think they do and I thought I might be interesting to go through some of the decision making process in a more considered fashion than answering questions after a talk allows you to do.

So some of the obvious applications are that if you want to leverage some Java frameworks and infrastructure then you definitely want to use Scala. Things like JPA, Spring-injection, Hibernate and bean-reflection are a lot easier with Scala; in Clojure you tend to be dancing around the expectations these frameworks have that they are working with concrete bean-like entities.

If you are going to work with concurrency or flexible data formats like CSV and JSON I think you definitely want to be using Clojure. Clojure has good multi-core concurrency that is pretty invisible to you as a programmer. The key thing is avoiding functions with side effects and making sure you update dependent state in a single function (transaction). After that you can rely on the language and its attendant frameworks to provide a lot of powerful concurrency.

Similarly LISP syntax and flexible data go hand in hand so writing powerful data transforms seems second nature because you are using fundamental concepts in the language syntax.

Algorithm and closed-domain problems are interesting. My personal view is that I find recursion easier in Clojure due to things like the explicit recur function and the support for variable-arity function definitions. Clojure’s default lazy sequences also make it easier to explore very large problem spaces. On the other hand if you have problems that can be expressed by state machines or transitions then you might be able to express the solution to a problem very effectively in a Scala case class hierarchy.

When it comes to exploring the capabilities of Java libraries I tend to use the Scala console but for general programming (slide code examples, exploratory programming) I do tend to find myself spending more time in LightTable‘s Instarepl.

When it comes to datastore programming both languages are actually pretty clunky because they devolve handling this down to various third-party libraries. Clojure does pretty well with document databases and key-value stores. Scala is great for interacting with the AWS Java libraries and neither deals particularly well with relational data.

For web programming neither is brilliant but Scala definitely has the edge in terms of mature and full-featured web frameworks. Clojure is definitely more in the log cabin phase of framework support currently.

Standard
Programming, Work

Optimizely testing like a hacker

At work we use Optimizely and I am a fan of the product; I think it has had a massive impact on the way we work and should really help guide us to decide what we choose to do.

However I am not a product manager, user testing expert or statistician (that last part is a lie, I’m a statistician who hasn’t done any stats for seventeen years) I am a dirty hacker programmer and I use Optimizely in a way that probably makes my colleagues weep but which I think actually makes it more valuable as a product. I want to talk about breaking some of the common rules that people put up around this testing.

Note that you need to understand what you’re doing here, I am not recommending this if you are new to the product or multi-variate testing. You also need a good stream of traffic to work on. I do, this is working out for me. One piece of good practice I would keep is: decide how you are going to judge the test before you start it and don’t change your measure once you’ve started. If it is clear your initial metrics aren’t helpful, design a new test. The knowledge you’ve gained is valuable for formulating the right measures.

Don’t change the test once you’ve started it

Only once the test has started can you understand what the problem you are dealing with is and what responses you can take to the issues. If you have a question about what is happening in the test feel free to create a new variation (always with a good name!) and throw it into the mix. I sometimes start with one variation and end the test with nine. It’s better to test immediately than speculate.

Changing a variation (no matter how tempting) is dangerous though as you’ll have to remember the differences and when you applied them. I prefer to spawn variations to changing an in-flight variation. Of course fixing bugs and unintentional consequences is fine. You’re looking at the long term rate not the initial performance.

Don’t change the traffic

I’m not sure this is a general shibboleth but I play around with traffic massively during the test. The great thing about Optimizely is that it takes care of the math so feel free to mix the allocation of traffic freely. If you have a run-away winner early on then don’t be afraid to feed the majority of traffic to it.

Make the test work for the whole audience

I don’t believe in this, make the test work for the easiest audience segment to access. I frequently only test on modern browsers. If you find a trend then shock, horror it often works for the whole audience. It’s about fast feedback not universal truth.

The biggest advantage is that you can use CORS-compliant browsers to do bigger changes to the pages under test.

Don’t change the underlying content

If you take your best performing variation and apply it to the page then the “Original” variation should trend to the variation. If it doesn’t then you know something is up with your measuring. I actually think it is really helpful to make a succession of changes to the base content, based on the tests until the Original variation is performing better than the individual variations.

Once Original is top performing variation you can stop testing the page.

A/A testing has problems

So what? Optimizely has a few issues, you need to deal in big numbers. A/A can be helpful but if you are working in five digit numbers or double-digit percentages then don’t worry about the noise.

Tests have to look good

If your theory is accurate it absolutely does not have to look good. If you are worried that your hypothesis is not working because of the visuals: get over yourself and admit that the idea was weak and you need to rethink it.

I like to start off all variations looking a bit crappy and then seeing whether they can be outperformed by an improved appearance. Often the answer is no; there is a rule of diminishing returns on the appearance of a variation. Things get over-designed on the web all the time. However by trying better looking variations in increments you know exactly how much effort to invest.

Standard
Programming

Refactoring abuse and strong type compiler systems

“Refactoring” is one of the most abused terms in programming. It has a formal meaning but when generally used it tends to mean rewriting or restructuring code (or as I like to refer to it: changing stuff). One interesting new use of refactoring I heard recently was to describe extracting common code. Creating some new codebase is perhaps the opposite of refactoring.

So refactoring tends to mean developers are just changing things they have already written. Real refactoring is of course done to code under test so I was interested in a Stuart Halloway quote about compilation being the weakest form of unit-testing. Scala is used a lot at the Guardian and it has a more powerful type system and compiler than Java which means if you play along with the type system you actually get a lot of that weak unit-testing. In fact structuring your code to maximise the compiler guarantees and adding the various assertion methods to make sure that you fail fast at runtime are two of things that help increase your productivity with Scala.

If you’ve seen the Coursera Scala videos you can see Martin Odersky doing some of this “weak refactoring” in his example code where he simplifies chained collection operations by moving or creating simple functionality in his types.

Of course just like regular refactoring there have to be a few rules to this. Firstly weak refactoring absolutely requires you use explicit function type declarations. Essentially in a weak refactor what you are doing is changing the body of a function while retaining its parameters and return type. If you can still compile after you’ve changed code you are probably good.

However the other critical thing is how much covariance the return type has. A return type of Option for example is probably a bad candidate for weak refactoring as it is probably critical whether your changed code still returns Some or None for a given set of a parameters. Only conventional refactoring can determine whether that is true.

Standard
Programming

Value time to fix not quality

One thing that I think has been really good recently at the Guardian is the promotion of the idea that what matters is understanding what is critical to our business and valuing quality only in these critical areas. For everything else the thing we should strive to improve is the time to fix. I.e. how long it takes from a problem being reported to it being fixed in production.

If we have a short or tiny time to fix then we can relax a lot of the traditional fixtures of software development like regression testing and metrics like bugs found in production.

What is also interesting is that when you do fix problems you can also look at how long the problem took to be reported and who reported it. If a problem was reported very quickly by a user then you have an indicator that a feature is perhaps more important than you thought.

If on the other hand a problem was reported a week after it occurred and not by the assumed consumers of the feature but by another group or department you have saved yourself a lot of time and effort in having to verify a feature that perhaps does not deserve to exist at all.

Standard
Programming

Our tools are doing us a disservice

Do you like using Intellij or a similar IDE that allows you to navigate your code base easily and restructure freely? Do you like the fact that your code has a huge test suite that allows you to make changes with confidence?

These things seem like good things. Why would anyone have a problem with them?

Recently though at conferences and in discussions at work it is starting to seem to me and other that powerful tools have a dark and dangerous side to them. The more powerful the tools you have at your disposal the longer and longer you can work on a codebase without facing up to the issues that you have.

A powerful IDE allows you to have insanely complex projects with hundreds, possibly thousands, of files in them. I’m not sure that the Java love of abstraction across multiple classes would have happened if you have had to navigate the resulting package structure with Vi.

Rather than working to simplify your code base you can continue to add in each special case and niche requirement, everyone can have a home with a Strategy pattern here and a class hierarchy there. Our test suite grows and grows to make sure that each overlapping requirement can be added safely and without consideration of its worth. We are perhaps proud that 60 to 80% of our codebase is test code that, in itself, is adding no value to our business.

Our rich dependency managers encourage us to add in libraries or even worse extract and share code across multiple projects. Until of course we start to burn in a transitive dependency hell or our own making.

We all love powerful tools, we all love powerful languages that are feature rich but the more powerful our tools our the more they should help us find the simplicity in what we do and ensure that we deliver measurable value quicker rather than providing just a longer noose.

Standard
Programming, Software

Preferring Microservices to Unified Services

So I want to present an argument between two philosophies in service orientated design: microservices and what I am calling the Unified Service. I am a fan of microservices so I am worried about presenting a straw man argument for the other side, originally I was going to call the unified service the One True Service, for example, but that seemed too snide.

Now there is a XKCD for everything and in this case it is this cartoon on standards that is relevant. However if argument via XKCD doesn’t float your boat let’s expand the appeal of the unified service.

The desire for a comprehensive service is completely understandable and actually the first wave of service orientation was based around the benefits of centralising services and providing consistency to many clients. However it was during the first wave of implementations that I first became suspicious of the viability of comprehensive services.

If you create a unified service then you end up taking on all the complexity of all your clients and bringing it into one huge uber-complex place. Every requirement and need ends up in the central service that then becomes a slew of conditional code and special cases (unless you have a very brilliant team of coders).

I have ended up preferring the exact opposite approach, heavily influenced by the UNIX philosophy, with lots and lots of microservices. Recently ending up in an apogee or nadir (depending on how you view it) of two whole webapps that differ only in that the expose different time periods of data.

I think this winds up many of my colleagues who regard it perhaps as an absurdly purist approach that actually reintroduces the complexity by having many services with their undocumented JSON formats and endpoints.

The reason I think the approach has merits is that I probably think more about maintaining code than creating it. I like the fact that while I have many services I only have to worry about the ones that are causing problems and when I am trying to fix them I have very small code surface areas to explore.

When I want to modify and change my service I don’t have to worry about taking out five service endpoints with one dodgy piece of shared code. The unified service is a terrifying thing to deploy because when you push out new code you need to verify everything is still working.

No problem right? We use automated testing to sort all this out. Well I think having to have a test suite for services is a bit of an anti-pattern. Something I will blog about later.

So okay, so now I have a test suite and I am no longer worried about breaking something when I push features out. The trouble is that I am now stuck in test land trying to figure out where the wires are crossed in the shared code and the bug is still in production and time is ticking away while I figure out how things relate and how my new requirement is conflicting with all the other requirements on the unified service.

My view is that it is okay to have massive code duplication and functionality overlap if you also have strong vertical separation and the ability to change small parts of a collaborating system. Systems are harder to manage than codebases and while you want both to be as good as they can be savings in codebases are wiped out if the resulting system is more complex and harder to change.

Standard