Programming

How do I query data with CouchDB?

This question comes up a lot when dealing with Couch and I have given various answers before but my latest answer is simply that you don’t. In reality what you want to do in Couch, like a lot of the NoSql databases, is look for key lookups.

Now the key lookup may be a range of keys you are interested in but in reality there is nothing in Couch that is similar to the SQL “WHERE” clause.

So if you cannot do queries then how do you relate data? Well that’s the thing about storing documents instead of rows, if you have related data then you have to ask whether that data has any meaningful existence outside of its parent. In relational terms it is like asking whether you ever access the content of table outside of JOIN with its parent.

Initially you might think: of course I do! But often data is often explicitly related to its parent’s primary key by things like ORDER and GROUP BY. In these kind of cases then you move the related data into the parent record, effectively denormalising to avoid a lookup.

If the data does have a meaningful existence outside the parent (for example in Wazoku comments are an example of a piece of data that exists separately from the thing they are a comment on) then you have a few options but essentially instead of querying you are still trying to do a direct key lookup.

The first simple case is to include a reference to key of the related data in the associated document. Then from one key lookup you can go direct to the next. As an example we store a list of comment document ids on any document that can be commented on and then we can load the comments as needed (often the count of the comments can be as relevant as the full content). I describe the ids used this way as “forward references” as they lead you on to the related document.

The second, slightly more involved approach, is the creation of a view that allows the document to be looked up via an alternative key. For example if we store the document id of the thing being commented on in the comment document under the key comment_on we can then create a mapping view of all comment documents to their comment_on key. Then given any document we can simply do a direct lookup on the key in the view to determine whether it has any associated comments.

The final common technique I use is something I refer to as “unrolling” of collections. So again we create a CouchDB view that consists just of a map job and in it we take each item in an array of “forward references” (related document ids) and emit a document in each view mapping the id to the current document id.

So if an idea document has five comment forward references the resulting view will have five documents, each relating a comment document id to the idea document id.

If things get more complicated then I also have the Couch databases indexed in Elasticsearch and in Neo4J and these alternative views of the data give me powerful adhoc queries on properties or relationships in the data.

In general though I am always trying to think ahead as to how my documents relate and then express that in terms of a key lookup so that I am always working with the simplest case.

Standard

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s