What is CouchDB?
Now the merits and flaws of the various markup languages is really holy war and I don’t want to get into it too much. I think it is enough to say that CouchDB aims to be lightweight in implementation without being lightweight in features or performance. JSON is very popular in the web space and by focussing on making common cases easier it is less work to use than XML. The more complex the data or specification requirements though the less viable it is as a solution.
Using JSON means that stored data is extremely flexible in structure, unlike relational data you can have very gappy or bitty information and not have a problem. Take something like a customer relationship management system. This is often a poor fit to relational data because you tend to discover information in small tranches. Initially a lead might be a first name and a telephone number. As the relationship develops you add a surname, a company name, a position, an collection of issues or queries and so on. JSON is a really good way of capturing this evolving picture of things.
The final component of CouchDb is the functional language Erlang which is used to service the HTTP REST interface that CouchDB uses to provide an interface to the database. Erlang provides very cheap, lightweight concurrency that provides a good fit for handling lots of HTTP requests.
The interesting thing is that CouchDB automatically provides the REST front end that you tend to build on top of Java object stores like JBoss Cache or Coherence (which I have also been looking at recently). With those products you tend to stay native until you need to share data with other languages or systems and then you tend to REST serve it out as XML with a lightweight HTTP server. If you can see that need ahead of time then CouchDB might well be compelling due to the built-in support.
After reading about CouchDB I wanted to take a look at this new document database in a bit more detail. I managed to install the base tarball from the Google Code page easily. I then cut and pasted the Ruby sample REST code from the CouchDb Wiki into irb and on running the server I found that I could store and retrieve documents really easily. The combination of interactive Ruby and the REST interface made it easy to play around with data objects.
However when trying to run the tests via the administration web page only four of the tests would pass. The only error message was just an Error Code of 136. Since I seemed able to store and pull documents (and could confirm that the datafile had been generated and had data) I put it down to alpha flakiness and pushed on.
CouchDB uses the Mozilla Spidermonkey libraries to construct its own View Server program which is configured via the couch.ini file. Having double-checked that the interpreter was there, was executable and could read the main.js script it was passed I was baffled. I decided to pull down the code base from SVN and then rebuild it. Building from SVN is a bit more involved than the tarfile and its worth double-checking the documentation on the wiki before you get stuck in. Having built and installed myself a new copy I still had the same problem and was feeling pretty cheesed off. With no other source of help I headed to IRC and checked out the Freenode #couchdb channel. There Christopher Lenz gave me a good steer that he had had a problem with Erlang’s HIPE (high performance virtual machine) on Debian.
I had been using Erlang 5.5.5 which I think I used apt-get to obtain. Downloading the latest Erlang release (5.6.1, they release pretty frequently) I tried building that from source and hung when configure tried to check for floating-point exceptions. Running configure with disable-hipe allowed configure to complete and then Erlang was straight-forward to build and install.
Restarting the CouchDB server I found my queries were working and all but one test (“conflicts” fails on an error assertion and this is apparantly a known issue) passing, hurrah!
The machine I had been working on was a Feisty Fawn Ubuntu instance so I loaded up Gutsy Gibbon in VM Ware Player and tried building Erlang and CouchDb there. Erlang was fine, even with Hipe. For CouchDB I used the latest Erlang I had built and used apt-get to obtain the Spidermonkey and ICU depdencies. Everything worked out of the box with this combination.
The final issue was where CouchDB stores its logs and datafiles by default. These default to /usr/local/var which is not really where I want to store data like this. /var is a more natural choice and FHS seems to agree. You can change this when building in the etc/couch.ini.tpl file of the source directory but it would be nicer to have a more natural choice by default.
Working with CouchDB
Learning how to generate queries for example is easy with the in-built query browser (I’ll save the actual syntax for a later post) as the feedback from the system is very quick. Similarly creating new databases, documents and fields is actually not torturous but slick and quick.
I’m certain this is because Erlang and AJAX form a natural partnership in creating and servicing small requests that need to be handled quickly. I may be wrong but this is certainly my most positive AJAX webservice experience to date.
The CouchDB wiki contains information on getting started with various languages and I choose Ruby. It was a simple cut and paste into irb and then it was straight-forward to interact with database from the shell and the web client.
One of CouchDB’s more unusual feature seems to stem from Erlang’s concurrency model. Data is assigned a revision and in addition to applying multiple changes in order you can also view previous revisions. That’s a pretty weird feature compared to most of the current datastores. You also need to refer to a revision if you want to update a record. If the revision of the target of the update has changed when you submit your changes your changes get refused. The revision mechanism is also the basis of the replication mechanism although there isn’t enough documentation to understand when the replication pairs check their revision ids.
Incremental data-construction is easy via the web-gui but you need to fiddle a little bit to get the revision number to target if doing it programatically. Presumably a library would make that easier going. For information that has little inherent structure or is very gappy or incremental then CouchDB is a great data store and is currently occupying a niche as far as I know.