Clojure

Getting started with Riemann stream processing

Riemann is a great application for dealing with event processing but it doesn’t have a lot of documentation or newbie friendly tutorials. There are some cool pictures that explain the principles of the app but nothing beyond that. At some point I want to try and contribute some better documentation to the official project but in the meantime here’s a few points that I think are useful for getting started.

I’m assuming that you’ve followed these instructions to get a working Riemann installation and you’ve followed the instructions on how to submit events to Riemann via the Ruby Riemann client interface.

At this point you want to start making your own processing rules and it is not clear how to start.

Well the starting point is the idea of streams when an event arrives in Riemann it is passed to each stream that has been registered with what is called the core. Essentially a stream is a function that takes an event and some child streams and these functions are stored in a list in the core atom under the symbol :streams.

Okay let’s look at an example. The first obvious thing you want to do is print out the events that you are sending to Riemann. If you’ve got the standard download open the etc/riemann.config file, set the syntax for the file to be Clojure, as this is read into Clojure environment in the riemann/config namespace and you can use full Clojure syntax in it. In the config file add the following at the end. Now either run the server or if it is running reload the config file with kill -HUP <Riemann PID>.

(streams prn)

prn is a built-in function that will print an event and pass it on to following streams.

In irb let’s issue an event:

r << {host: "rrees.me", service: "posts", metric:  5}

You should see some output in the Riemann log along the following lines.


#riemann.codec.Event{:host "rrees.me", :service "posts", :state nil, :description nil, :metric 5, :tags nil, :time 1366450306, :ttl nil}

I’m going to assume this has worked for you. So now let’s see how events get passed on further down the processing chain. If we change our streams function to the following and reload it.

(streams prn prn)

Now we send the event it should get printed twice! Simples!

Okay now let’s look at how you can have multiple processing streams working off the same event. If we add a second print stream we should get three prints of the event.

(streams prn prn)

(streams prn)

Each stream that is registered can effectively process the event in parallel so some streams can process an event and send it to another system while another can write it to the index.

Let’s change one of our prints slightly so we can see this happen.

(streams (with :state "normal" prn) prn)

(streams prn)

We should now get three prints of the event and in one we should see that the event has the state of “normal”. Okay great! Let’s break this down a bit.

Every parameter of streams is a stream and a stream can take an event and child streams. So when an event occurs it is passed to each stream, each stream might specify more streams that the transformed event should be passed to. That’s why we pass prn as the final parameter of the with statement. We’re saying add the key-value pair to the event and pass the modified event to the prn stream.

Let’s try implementing this by ourselves, there is a bit of magic left here, call-rescue is an in-built function that will send our event to other streams you can think of it as a variant of map:

(defn change-event [& children]
  (fn [event]
    (let [transformed-event (assoc event :hello :world)]
      (call-rescue transformed-event children))))

(streams (change-event prn))

If this works then we should see an event printed out that has the “hello world” key-value pair in it. change-event is a stream handler that takes a list of “children” streams and returns a function that handles an event. If the function does not pass the event onto the children streams then the event stream stops processing, which is a bit like a filter. The event is really just a map of data like all good Clojure.

At this point you actual have a good handle on how to construct your own streams. Everything else is going to be a variation on this pattern of creating a stream function that returns an event handler. The next thing to do now is go and have a look at the source code for things like withprn and call-rescue. Peeking behind the curtain will take a certain amount Clojure experience but it really won’t be too painful, I promise, the code is reasonable and magic is minimal. Most of the functions are entirely self-contained with no magic so everything you need to know is in the function code itself.

Standard
Clojure, Programming

Leiningen doesn’t compile Protocols and Records

I don’t generally use records or protocols in my Clojure code so the fact that Clojure compiler doesn’t seem to detect changes in the function bodies of either took me by surprise recently. Googling turned up this issue for Leiningen. Reading through the issue I ended up specifying all the namespaces containing these structures in the :aot definition in the lein project.clj. This meant that the namespace was re-compiled every time but that seemed the lesser of two evils compared to the clean and build approach.

Where this issue really stung was in the method-like function specifications in the records and as usual it felt that structure and behaviour was getting muddled up again when ideally you want to keep them separate.

Standard
Programming

Keeping NPM local

I’m pretty sure that at one point NPM might not have required sudo privileges to do things but by default it now seems that people are sudo installing all over the place. It’s the equivalent of just clicking “yes” on every Windows permission dialog.

There is no particularly good reason to use sudo with a package manager. Virtualenv is tres local by default and even gem (via rbenv and rvm) can now run happily in the user space. Even if you want to share package downloads to minimise network calls then you still don’t need to install everything globally as root.

NPM is special in almost every sense of the word and it’s author recommends that you switch permissions on /usr/local to be owned by your login user. That seems kind of crazy and the kind of thing that probably would only work out for OSX users.

In fact if you are willing to build your own NodeJS it really isn’t hard to bet Node and NPM working locally and still retaining all the advantages of the “global” NPM install. Just use the standard build option of –prefix to set Node to use your home directory and just add $HOME/bin to your PATH in .bashrc or the equivalent.

Then you should be able to use npm without ever having to sudo.

My view though is that you shouldn’t have to force everything into the user space just to make sure a package manager doesn’t need sudo privileges. It would be better for everyone if NPM used a directory in home for each user since it seems to be aimed at single-user installs anyway there is not a massive saving by having packages installed in /usr/local. For the few use-cases where that would be useful then it should be an override option.

Standard
Programming

Cognitive bias and the difficulty of evolving strongly typed solutions

Functional Exchange 2013 featured an interesting talk by Paul Dale that had been mostly gutted but had a helpful introduction to cognitive bias. That reminded me of something I was trying to articulate about in my own talk when I was talking about the difficulty of evolving typed solutions. I used the analogy of the big ball of mud where small incremental changes to the model of the system result in an increasing warped codebase. Ultimately you need someone to come along and rationalise the changes or you end up with something that is ultimately too difficult to work with and is re-built.

This of course is an example of anchoring where the initial type design for the system tends to remain no matter how far the domain and problem have moved from the initial circumstances of their creation.

Redesigning the type definition of a program is also expensive in terms of having to create a new mental model of what is happening rather than just adapting the existing one. Generally it is easier to adapt and incorporate an exception or set of exceptions to the existing model rather than recreating an entire system.

Dynamic or data-driven systems are more sympathetic to the adaptive approach. However they too have their limits, special cases need to be abstracted periodically and holistic data objects can bloat out of control into documents that are dragging the whole world along with them.

Type-based solutions on the other hand need to be periodically re-engineered and the difficulty is that the whole set of type definitions need to be worked on at the same time. Refactoring patterns of object-orientated code often focus on reorganisation that is easy, such as pulling out traits or extracting new structures. This is still anchoring on the original solution though.

If a type system is to be help rather than a hindrance you need to be rework the overall structure. I think with most type-systems this is actually impossible. Hence the pattern of recreating the application if you want to take a different approach.

The best typed solution currently seems to be type unions where you can make substantial changes and abstractions but not have to work about things like type hierarchies or accidentally polluting the edge cases of the system.

Where these aren’t available then good strongly-typed solutions actually rely heavily on good, proactive technical leaders to regularly drive good change through the system and manage the consequences.

Standard
Clojure, Programming, Web Applications

A batteries included Clojure web stack

Inspired by the developer experience of the Play framework as well as that of Django and Ruby on Rails I’ve been giving some thought to what a “batteries included” experience might be for Clojure web development. Unlike things like Pedestal which focuses on trying to keep LISPers happy and writing LISP as much as possible I’m approaching this from the point of view of what would be attractive to frontend developers who choose between things like Rails, Sinatra or Express.

First lets focus on what we already have. Leiningen 2 gives us the ability to create application templates that define the necessary dependencies and directory structures as well as providing an excellent REPL. This should allow us to build a suitable application with a single command. The Compojure plugin already does a lot of the setup necessary to quickstart an application. It downloads dependencies and fires up a server that auto-reloads as the application changes.

The big gap though is that the plugin creates a very bare bones application structure, useful for generating text on the web but not much else. To be able to create a basic (but conventional) web app I think we need to have some standard things like a templating system that works with conventional HTML templates and support for generating and consuming JSON.

Based on my experience and people’s feedback I think it would be worth basing our package on the Mustache templating language via Clostache and using Cheshire to generate and parse the JSON (I like core.data’s lack of dependencies but this is web programming for hackers so we should favour what hackers want to use).

I also think we need to set up some basic static resources within the app like Modernizr and jQuery. A simple, plain skin might also be a good idea unless we can offer a few variations within the plugin such as Bootstrap and Foundation which would be even better.

Supporting a datastore is probably too hard at the moment due to the lack of consensus about what a good allround database is. However I think it would be sensible to offer some instructions as to how to back the app with Postgres, Redis and MongoDB.

I would include Friend by default to make authentication easy and because its difficult to to do that much interesting stuff without introducing some concept of a user. However I think it is important that by default the stack is essentially stateless so authentication needs to be cookie-based by default with an easy way of switching between persistence schemes such as memory and memcache.

Since webapps often spend a lot of time consuming other web services I would include clj-http by default as well. Simple caching that can be backed by memcache also seems important since wrapping Spymemcache is painful and the current Clojure wrappers over it don’t seem to work well with the environment constraints of cloud platforms like Heroku.

A more difficult requirement would be asset pipelining. I think by default the application should be capable of compiling and serving LESS and Coffeescript, with reloading, for development purposes. However ideally during deployment we want to extract all our static resources and output the final compiled versions for serving out of a static handler or alternatively a static resource host. I hate asset fingerprinting due to the ugliness it introduces into urls, I would prefer an ETag solution but fingerprinting is going to work with everything under the sun. I think it should be the default with an option to use ETags as an alternative.

If there was a lein plugin that allowed me to create an application like this with one command I would say that we’re starting to have a credible web development platform.

Standard
Clojure, Programming, Scala

Horses for courses: choosing Scala or Clojure

So one of the questions after my recent talk trying to compare Scala and Clojure (something that I suspect is going to be an ongoing project as I hone the message and the tone) was about whether the languages had problem domains they were more suited too. That’s an interesting question because I think they do and I thought I might be interesting to go through some of the decision making process in a more considered fashion than answering questions after a talk allows you to do.

So some of the obvious applications are that if you want to leverage some Java frameworks and infrastructure then you definitely want to use Scala. Things like JPA, Spring-injection, Hibernate and bean-reflection are a lot easier with Scala; in Clojure you tend to be dancing around the expectations these frameworks have that they are working with concrete bean-like entities.

If you are going to work with concurrency or flexible data formats like CSV and JSON I think you definitely want to be using Clojure. Clojure has good multi-core concurrency that is pretty invisible to you as a programmer. The key thing is avoiding functions with side effects and making sure you update dependent state in a single function (transaction). After that you can rely on the language and its attendant frameworks to provide a lot of powerful concurrency.

Similarly LISP syntax and flexible data go hand in hand so writing powerful data transforms seems second nature because you are using fundamental concepts in the language syntax.

Algorithm and closed-domain problems are interesting. My personal view is that I find recursion easier in Clojure due to things like the explicit recur function and the support for variable-arity function definitions. Clojure’s default lazy sequences also make it easier to explore very large problem spaces. On the other hand if you have problems that can be expressed by state machines or transitions then you might be able to express the solution to a problem very effectively in a Scala case class hierarchy.

When it comes to exploring the capabilities of Java libraries I tend to use the Scala console but for general programming (slide code examples, exploratory programming) I do tend to find myself spending more time in LightTable‘s Instarepl.

When it comes to datastore programming both languages are actually pretty clunky because they devolve handling this down to various third-party libraries. Clojure does pretty well with document databases and key-value stores. Scala is great for interacting with the AWS Java libraries and neither deals particularly well with relational data.

For web programming neither is brilliant but Scala definitely has the edge in terms of mature and full-featured web frameworks. Clojure is definitely more in the log cabin phase of framework support currently.

Standard
Programming, Work

Optimizely testing like a hacker

At work we use Optimizely and I am a fan of the product; I think it has had a massive impact on the way we work and should really help guide us to decide what we choose to do.

However I am not a product manager, user testing expert or statistician (that last part is a lie, I’m a statistician who hasn’t done any stats for seventeen years) I am a dirty hacker programmer and I use Optimizely in a way that probably makes my colleagues weep but which I think actually makes it more valuable as a product. I want to talk about breaking some of the common rules that people put up around this testing.

Note that you need to understand what you’re doing here, I am not recommending this if you are new to the product or multi-variate testing. You also need a good stream of traffic to work on. I do, this is working out for me. One piece of good practice I would keep is: decide how you are going to judge the test before you start it and don’t change your measure once you’ve started. If it is clear your initial metrics aren’t helpful, design a new test. The knowledge you’ve gained is valuable for formulating the right measures.

Don’t change the test once you’ve started it

Only once the test has started can you understand what the problem you are dealing with is and what responses you can take to the issues. If you have a question about what is happening in the test feel free to create a new variation (always with a good name!) and throw it into the mix. I sometimes start with one variation and end the test with nine. It’s better to test immediately than speculate.

Changing a variation (no matter how tempting) is dangerous though as you’ll have to remember the differences and when you applied them. I prefer to spawn variations to changing an in-flight variation. Of course fixing bugs and unintentional consequences is fine. You’re looking at the long term rate not the initial performance.

Don’t change the traffic

I’m not sure this is a general shibboleth but I play around with traffic massively during the test. The great thing about Optimizely is that it takes care of the math so feel free to mix the allocation of traffic freely. If you have a run-away winner early on then don’t be afraid to feed the majority of traffic to it.

Make the test work for the whole audience

I don’t believe in this, make the test work for the easiest audience segment to access. I frequently only test on modern browsers. If you find a trend then shock, horror it often works for the whole audience. It’s about fast feedback not universal truth.

The biggest advantage is that you can use CORS-compliant browsers to do bigger changes to the pages under test.

Don’t change the underlying content

If you take your best performing variation and apply it to the page then the “Original” variation should trend to the variation. If it doesn’t then you know something is up with your measuring. I actually think it is really helpful to make a succession of changes to the base content, based on the tests until the Original variation is performing better than the individual variations.

Once Original is top performing variation you can stop testing the page.

A/A testing has problems

So what? Optimizely has a few issues, you need to deal in big numbers. A/A can be helpful but if you are working in five digit numbers or double-digit percentages then don’t worry about the noise.

Tests have to look good

If your theory is accurate it absolutely does not have to look good. If you are worried that your hypothesis is not working because of the visuals: get over yourself and admit that the idea was weak and you need to rethink it.

I like to start off all variations looking a bit crappy and then seeing whether they can be outperformed by an improved appearance. Often the answer is no; there is a rule of diminishing returns on the appearance of a variation. Things get over-designed on the web all the time. However by trying better looking variations in increments you know exactly how much effort to invest.

Standard