Someone is wrong on the internet, the unfortunate thing is, that person is me. In one version of my NoSql article for ThoughtWorks I put Riak under the heading of a column store. Twitter feels this is wrong and that it is better classified as a distributed key-value store. I agree this description is more accurate than mine but it feels like calling limes, lemons, oranges and grapefruit citrus fruits.
When I was putting together the article I knew I would have to look at BigTable and Dynamo derivatives that were generally available. Now BigTable self-identifies as a column store where as Dynamo describes itself as a distributed key-value store. However both systems have a lot of properties in common:
- They are peer-aware
- Fault-tolerant
- Distributed
- Can be scaled to demand
Now for me these common characteristics also explain why something like Redis isn’t the same as Riak and it’s an over-simplification to just call Riak and Cassandra distributed key-value stores.
So I saw this similarity between the situations (and here’s where the mistake occurs) and thought: well you know if you look at things like Riak that allow metadata on their bucket you can consider them to be a column-store because the columns are the keys on the metadata and the bucket is actually just a key of a special type. Each key can have a variable number of columns but must have one, the bucket. Then when you query the metadata you are kind of just doing a column sort and search. Brilliant! I can just lump all these different systems under this term.
Now I don’t want to jump to another label, I want to try and get the description of the group more or less correct. Looking at the things that are common I feel that something along the lines of “P2P clustered datastores” might be more accurate.