Programming

Clojure Exchange 2013: Tommy Hall on Concurrency versus Parallelism

One of the really interesting talks at Clojure Exchange 2013 was one by Tommy Hall with the (not good) title You came for the concurrency, right?.

The talk had two main threads. It served as a helpful review of Clojure’s state-handling, concurrency and parallel processing features. Useful for beginners but also a helpful recap for the more experienced.

The other was a discussion of what we mean by concurrency and parallelism. Something that I hadn’t really thought about before (although I was aware there was a difference). Tommy referenced this talk Concurrency is not Parallelism (it’s better) by Rob Pike.

In the talk Rob gives the following definitions:

Concurrency: programming as the composition of independently executing processes

Parallelism: programming as the simultaneous execution of (possibly related) computations

In the talk Tommy gives the future function as an example of programming to indicate concurrency boundaries and the reducers library as an example of parallelism.

Rob’s presentation is worth reading in full but his conclusion is that concurrency is not a guarantee of parallel execution but that achieving parallelism without concurrency is very hard or impossible. Tommy’s talk uses an Erlang example to make the same point.

Don’t fear the monoid

While discussing reducers Tommy finally explained something that I struggled before. A lot of type champions point at reducers and shout Monoids! As if that was some kind of argument.

During his talk Tommy explains that parallel combination function needs to be able to return an identity so the function always returns a value and that because ordering of values is not guaranteed (unlike the implied order in a regular reduce)  it needs to be associative.

That makes sense. Turns out that those are also the properties of a monoid.

Fans of type systems throw terms like Monad, Dual and Monoid not to help add understanding to a discussion but to use them as shibboleths. It was far more enlightening to see an example of where the needs of a problem were driving to a category of function with certain properties. If that is common enough to deserve a shorthand-name, fair enough, but the name itself is not magical and knowing the various function category names is a feat of learning rather akin to memorising all those software pattern names from the Gang of Four’s book.

Standard
Software, Work

Comparing Jabber servers

I recently had a trawl around the available Jabber servers looking for something that was suitable for use as a messaging system for a website. My first job was to do a quick review of what is available out there. The first thing that was quite clear that is ejabberd has massive mindshare. There was a definite feeling of “why would you want to try other servers when you could just be running ejabberd?”.

Well there are kind of two answers; first there is inevitable Erlang objection. This time from sysops who felt uncomfortable with monitoring and support. It is a fair point, I feel Erlang can be particularly obtuse when it’s failing. The second was that ejabberd stubbornly refused to start up on my Fedora test box. Neither the yum copy nor a hand-built version cut the mustard. Ironically I was able to get an instance running in minutes on my own Ubuntu-based virtual machine (hand-built Erlang and ejabberd).

Looking a JVM-based alternatives I looked at OpenFire and Tigase. Tigase has lovely imagery but also seemed to have spam over its comments and the installation was a pig that I gave up on quickly. OpenFire is one of those old-school Java webapps where you are meant to manage everything via a web gui. This makes some kinds of  tasks easy but you have to write your own plugins to get programatic access to the server. I didn’t want to setup up an external database (why on earth would I want to do that?) but the internal HSQL store left me with no way of easily tweaking the setup of the server, a command-line tool would have been ultra-helpful because… OpenFire is annoyingly buggy, by this I mean it doesn’t have a lot of bugs but they are really annoying. After running through the gui setup process I tried to login. This was the wrong thing to do. What I needed to do was stop the server and restart it. Now the server was fucked and I needed to manually delete data and tweak config files to allow me to do the setup again.

Once it was running there was also some fun and games getting the BOSH endpoint to work. Do you need a trailing slash or not? I don’t remember but get it wrong and it doesn’t work. If you try to HTTP GET the endpoint then you get an error, this is technically correct but leaves you wondering whether your config is correct or not (if you know the error message you are looking for perhaps it is helpful but I didn’t so it wasn’t). ejabberd (when I got it working) is more helpful in giving an OHAI message that at least confirms you have something to point the client at.

OpenFire did what I needed but seems to make an implicit assumption that there is going to be an admin working at screens to configure and monitor everything. It feels more like a workgroup tool than a workhorse piece of infrastructure.

One oddity I found was Prosody, it sensibly used the same defaults, urls and conventions as ejabberd but is written in Lua and was gloriously lightweight. It absolutely hit the spot for development work and actually felt fun. I was able to script everything I needed to do via its ctl script. Of course if Erlang (a big, serious infrastructure language) is a bit of an issue the hipster scripting language might be too.

What the hell, if it was about doing what I need to do quickly and without fuss I would use Prosody and worry about any infrastructure issues later.

Standard
Software

Get your own Couch

At Erlounge in London I recently had the chance to catch up with the Couch.io guys J Chris Anderson and Jan Lehnardt. The conversation was interesting as ever and I was intrigued by JChris’s ideas that CouchDb has twisted conventional data storage logic on it’s head by making it easy to create many databases with relatively small amounts of information; the one database per user concept.

More importantly though I discovered it was okay to talk about the hosted Couch instances that Couch.io (who also do CouchDb consulting if you need some help and advice) are offering. The free service offers you a Basic authentication account with 100Mb of storage to play around with Couch to your heart’s content. Paying brings more space but also more sophisticated authentication options.

The service is the perfect way to play around with Couch and learn how you could use it go get an account today! It’s on the Cloud as well: schema-less data on the Cloud, how buzzword compliant is that?!

On a more serious note, this is an excellent service and one I have been asking for as it allows people who have no desire to build and maintain Erlang and Couch to use the datastore as a web service rather than as a managed infrastructure component.

Standard
Programming, Ruby, Scripting

CouchDB

What is CouchDB?

CouchDB is a dedicated document based database that kind of puts it in the same space as Exist, Xindice and Oracle Berkeley XML Db. What makes it very different is that instead of building around XML it is built around Javascript. The document storage format is JSON and the query language is Javascript. Something like Exist uses a minimum of well-formed XML for data and XPath or XQuery for querying.

Now the merits and flaws of the various markup languages is really holy war and I don’t want to get into it too much. I think it is enough to say that CouchDB aims to be lightweight in implementation without being lightweight in features or performance. JSON is very popular in the web space and by focussing on making common cases easier it is less work to use than XML. The more complex the data or specification requirements though the less viable it is as a solution.

Using JSON means that stored data is extremely flexible in structure, unlike relational data you can have very gappy or bitty information and not have a problem. Take something like a customer relationship management system. This is often a poor fit to relational data because you tend to discover information in small tranches. Initially a lead might be a first name and a telephone number. As the relationship develops you add a surname, a company name, a position, an collection of issues or queries and so on. JSON is a really good way of capturing this evolving picture of things.

Using Javascript to query this data surprised me initially, I feel pretty comfortable with my limited Javascript skills and therefore didn’t have a problem with it as a choice. I’ll come back to this later in the post but actually the idea sounds strange but is a natural fit when you use it. Again it is about the right tool for the right job but in the context of the data you are using in CouchDB creating a query by combing iterators, if statements and the functional programming map function is a better fit than trying to adapt SQL.

The final component of CouchDb is the functional language Erlang which is used to service the HTTP REST interface that CouchDB uses to provide an interface to the database. Erlang provides very cheap, lightweight concurrency that provides a good fit for handling lots of HTTP requests.

The interesting thing is that CouchDB automatically provides the REST front end that you tend to build on top of Java object stores like JBoss Cache or Coherence (which I have also been looking at recently). With those products you tend to stay native until you need to share data with other languages or systems and then you tend to REST serve it out as XML with a lightweight HTTP server. If you can see that need ahead of time then CouchDB might well be compelling due to the built-in support.

Installing

After reading about CouchDB I wanted to take a look at this new document database in a bit more detail. I managed to install the base tarball from the Google Code page easily. I then cut and pasted the Ruby sample REST code from the CouchDb Wiki into irb and on running the server I found that I could store and retrieve documents really easily. The combination of interactive Ruby and the REST interface made it easy to play around with data objects.

However when trying to run the tests via the administration web page only four of the tests would pass. The only error message was just an Error Code of 136. Since I seemed able to store and pull documents (and could confirm that the datafile had been generated and had data) I put it down to alpha flakiness and pushed on.

It was only when I tried executing ad-hoc queries and kept having them fail that I realised what was linking my problems. Every time the main Erlang server handed off a query to the Javascript interpreter the process died. Erlang is really resilient to thread death so there was zero impact from this.

CouchDB uses the Mozilla Spidermonkey libraries to construct its own View Server program which is configured via the couch.ini file. Having double-checked that the interpreter was there, was executable and could read the main.js script it was passed I was baffled. I decided to pull down the code base from SVN and then rebuild it. Building from SVN is a bit more involved than the tarfile and its worth double-checking the documentation on the wiki before you get stuck in. Having built and installed myself a new copy I still had the same problem and was feeling pretty cheesed off. With no other source of help I headed to IRC and checked out the Freenode #couchdb channel. There Christopher Lenz gave me a good steer that he had had a problem with Erlang’s HIPE (high performance virtual machine) on Debian.

I had been using Erlang 5.5.5 which I think I used apt-get to obtain. Downloading the latest Erlang release (5.6.1, they release pretty frequently) I tried building that from source and hung when configure tried to check for floating-point exceptions. Running configure with disable-hipe allowed configure to complete and then Erlang was straight-forward to build and install.

Restarting the CouchDB server I found my queries were working and all but one test (“conflicts” fails on an error assertion and this is apparantly a known issue) passing, hurrah!

The machine I had been working on was a Feisty Fawn Ubuntu instance so I loaded up Gutsy Gibbon in VM Ware Player and tried building Erlang and CouchDb there. Erlang was fine, even with Hipe. For CouchDB I used the latest Erlang I had built and used apt-get to obtain the Spidermonkey and ICU depdencies. Everything worked out of the box with this combination.

This problem highlights a few of CouchDB’s weaknesses. Firstly it is all bleeding edge stuff, Erlang is bounding along and so is CouchDB. Secondly there is no test suite in the source distribution that can help troubleshoot issues. Finally the error logging is tricky to understand for a neonate. I’m certain that having an EUnit test suite would have directed me towards the fact that the Javascript was failing much quicker than using trial and error and IRC.

The final issue was where CouchDB stores its logs and datafiles by default. These default to /usr/local/var which is not really where I want to store data like this. /var is a more natural choice and FHS seems to agree. You can change this when building in the etc/couch.ini.tpl file of the source directory but it would be nicer to have a more natural choice by default.

I also tried to install CouchDB on OSX without using a packaging mechanism but instead referring to an instance of Spidermonkey 1.7 I had on my system. While I could get configure to accept the Javascript library it wouldn’t recognise the jsapi.h header, maybe because configure doesn’t define the right Macros when it tries to build the test file.

Working with CouchDB

My first experience of CouchDB’s web interface was that it was good-looking but that the error messages, unresponsive links, Firebug warnings and occasional Javascript pop-ups were all signs that it was a work in progress. On fixing my passthrough to Javascript I had a very different experience of a slick and good looking interface that uses amazingly responsive AJAX to provide a really good experience.

Learning how to generate queries for example is easy with the in-built query browser (I’ll save the actual syntax for a later post) as the feedback from the system is very quick. Similarly creating new databases, documents and fields is actually not torturous but slick and quick.

I’m certain this is because Erlang and AJAX form a natural partnership in creating and servicing small requests that need to be handled quickly. I may be wrong but this is certainly my most positive AJAX webservice experience to date.

The CouchDB wiki contains information on getting started with various languages and I choose Ruby. It was a simple cut and paste into irb and then it was straight-forward to interact with database from the shell and the web client.

One of CouchDB’s more unusual feature seems to stem from Erlang’s concurrency model. Data is assigned a revision and in addition to applying multiple changes in order you can also view previous revisions. That’s a pretty weird feature compared to most of the current datastores. You also need to refer to a revision if you want to update a record. If the revision of the target of the update has changed when you submit your changes your changes get refused. The revision mechanism is also the basis of the replication mechanism although there isn’t enough documentation to understand when the replication pairs check their revision ids.

Incremental data-construction is easy via the web-gui but you need to fiddle a little bit to get the revision number to target if doing it programatically. Presumably a library would make that easier going. For information that has little inherent structure or is very gappy or incremental then CouchDB is a great data store and is currently occupying a niche as far as I know.

Standard