Programming

Why do you think Riak is a column-store?

Someone is wrong on the internet, the unfortunate thing is, that person is me. In one version of my NoSql article for ThoughtWorks I put Riak under the heading of a column store. Twitter feels this is wrong and that it is better classified as a distributed key-value store. I agree this description is more accurate than mine but it feels like calling limes, lemons, oranges and grapefruit citrus fruits.

When I was putting together the article I knew I would have to look at BigTable and Dynamo derivatives that were generally available. Now BigTable self-identifies as a column store where as Dynamo describes itself as a distributed key-value store. However both systems have a lot of properties in common:

They are peer-aware
Fault-tolerant
Distributed
Can be scaled to demand

Now for me these common characteristics also explain why something like Redis isn’t the same as Riak and it’s an over-simplification to just call Riak and Cassandra distributed key-value stores.

So I saw this similarity between the situations (and here’s where the mistake occurs) and thought: well you know if you look at things like Riak that allow metadata on their bucket you can consider them to be a column-store because the columns are the keys on the metadata and the bucket is actually just a key of a special type. Each key can have a variable number of columns but must have one, the bucket. Then when you query the metadata you are kind of just doing a column sort and search. Brilliant! I can just lump all these different systems under this term.

Now I don’t want to jump to another label, I want to try and get the description of the group more or less correct. Looking at the things that are common I feel that something along the lines of “P2P clustered datastores” might be more accurate.

Programming, Web Applications

Can you use NoSql?

I think the answer is yes. The reason is that traditionally relational datastores have ended up as being the dumping ground for data. Everything has ended up there and with the advent of new data storage technology there is a chance to rummage around the various piles of data and ask whether things are in the right home or not.

One thing I’ve been doing a lot recently is data-driving HTML Form components. That’s a lot easier when you are just reading the data out of documents and lists rather than out of tables. The first advantage is that you don’t have to size your option text for example. Variable text labels? No problem. The second is that you can move away from numeric values to having text-based slug keys or even use existing conventions like ISO language short codes.

You don’t have to use numbers with relational data of course but it tends to happen due to leaky ORM solutions that are orientated around the Long Primary Key.

Another area where you can probably take advantage of a NoSql store is in the small bits of text that occurs around your site but which should be maintained by business owners rather than the front-end team. Thing of those straplines, boxed text and success stories. Maybe they are stored in CLOBs somewhere in the database perhaps in a table called something cryptic like user_text. Let’s liberate that data into a key-store!

I find myself using a lot of Textile and Markdown text in my sites and it is an almost trivial exercise to process and display it from a NoSql database. I would encourage you to give it a go, it’s low risk but it should illustrate some of the benefits of the new stores and suggest the kinds of other problems you have in your application that some NoSql could solve.

Clojure

Control expressions are functions too

I am kind of simultaneously learning Clojure and Scala, the former through the London Clojure Dojo (come and join us) and the latter through work. It is an interesting experience but I am very glad that Clojure is part of the mix as it helps understand a lot of things that are seem strange in Scala.

In Clojure everything is essentially either a function or a sequence. It is therefore not surprising to see that an if statement works like a function.

(if false 1 2)

Evaluating the above in a Clojure REPL results in the answer 2: if is a function that evaluates its first argument returning the second parameter if the evaluation is true, the third if false.

The same is true of Scala, with the additional wrinkle that if the different clauses evaluate to different types you could be in type hell. Despite its superficial similarity to Java syntax it is in fact quite different and you can compose it just as you would any other function.

1 + (if(true) 2 else 3)

Evaluated in a Scala REPL gives the result 3.

Scala 2.8 seems much better about making everything return a value (and hence act like a function), val and for will both now return Unit, which is not useful but prevents the compiler moaning.

This kind of thing is much easier to appreciate and learn in a pure functional language than in Scala where you never really know whether something is going to operate in a functional or object-orientated way.

Python, Web Applications, Work

Declare for simplicity

I was struggling with an issue today on a template that was blowing up due to missing keys in a template data hash. At first I tried to write some conditional code in the template. That ended up being quite ugly so then I was trying to find a way to condition the template on whether the key was present.

Then it struck me that the issue was really the missing key. As the hash is prepared in Python code the original implementation frugally avoided creating the entry if the underlying data wasn’t present. While this is clever and minimal it actually just pushes the complexity out of the Presenter logic and into the template where it is much worse because you can only interact with the Presentation’s abstraction of the original data.

The problem here is that dynamic languages sometimes allow you to be too clever. In a declared structure system you have to declare all your data and provide sensible defaults. This makes writing the templating solution at lot easier as you never have a missing data problem to deal with (you still have headaches with duff defaults but that is a different post).

So I went back to the presenter and declared an empty list under the key that had been causing me grief. Bingo! My failure went away and the default behaviour of not generating any content for the empty list kicked off, reducing my template back to the unconditional single-line I had originally.

Programming

The Cathedral and the Lemonade Stand

Software is big, hard and complicated. It has also traditionally been long-lived, often lasting beyond all the reasonable expectations of its creators.

These realities have driven a lot of software “best practices” in the last few years. Suites of tests to make sure the software is easy to change with confidence, the understanding that software needs to be complete in it itself rather than accompanied by auto-generated documentation files or 200 page manuals, the understanding that software is written to be read not to do things.

It has also led to more difficult arguments that have yet to be won. Things like the fact that software needs to be considered more like infrastructure, with ongoing costs. Most people realise that if they don’t clean their buildings they get dirty and if they don’t service their cars then eventually they stop working. Most people though are happy to pay large sums creating software only to avoid paying anything further and thereby creating a system that slowly slips into weed-ridden obsolescence.

It is this tendency to regard software as something you purchase once rather than an ongoing investment that I want to talk about here.

An interesting thing is starting to happen just now with the arrival of high-productivity languages such as Python and Ruby and flexible NoSql data solutions like CouchDb and key-value stores. Software is probably easier to create (without compromising the practices we’ve come to understand are important) now than ever before. It is also getting easier and easier to deploy applications with cloud services like Google App Engine and Heroku for the web and EC2 for raw machines.

In short we can now turn around software very quickly if chose to. This creates an interesting avenue for tackling the maintenance problem by caving in to what budget holders tend to do naturally. Budgets are generally spent on specific items of functionality. Maintaining that functionality as service tends to come out of a generalised pot, if at all. Normally arguments about solving this problem have focused on trying to include the true cost of the product into the initial budget.

But why? Why don’t we just create a piece of software and then later when we want it do something else through it away and start again. Take a lemonade stand. You don’t build a lemonade stand out of stone and marble with a team of master masons. When the hot weather comes round you grab some wood and cardboard and make a stand that will be good enough for the weekend or evening that you need it for. If next week it is still hot and you want to sell lemonade again you just make it again.

Websites are a lot like this, they get old very quickly and on the front end, they probably have a six-month lifecycle before design trends changes or new browser standards are implemented or some new way of using the web is discovered. Building a website to last four years (or the 10 or 20 years that key infrastructure software tends to be used for) is a waste of time, it’s very unlikely to repay your effort over it’s actual lifetime of, maybe, two years.

Even in software that does last 10 years there is often a feeling of regret that due to tremendous sunk costs in developing it it is not viable to move it to commodity hardware or the cloud or in the case of banking mainframe code make any change to it of any significance.

We can now develop really powerful applications in the timeframe of four to ten weeks. If that application lasts three months with no additional effort and we then spend another four to ten weeks replacing it, are we not actually better off? We are able to implement our lessons learnt sooner. Solutions are much more flexible and mistakes do not come with long-running costs attached. A bad idea can just be left as is.

I’m not arguing for a complete rewrite every three months, just like the lemonade stand we can reuse the good bits, our sign or the good piece of wood for the counter perhaps. Things like Guerilla SOA are taking us in these directions anyway with loosely coupled services with standard-based protocols and interchange formats. If we write a great authentication service we can keep that, or we could replace it with OAuth or OpenId. Our options are open and wider when we play in the short term rather than the long term.

Programming

Continuous Testing

Continuous testing is one of those things that has crept up on me slowly. About two years ago I was aware of people using a trigger on their TextMate save to run tests and, if green, commit to git. At the time it felt too much effort for too little gain but it was a cool trick.

Now as we forage out into the post-Java world we are starting to get some pretty cool revisions of familiar tools and one of the most engaging for me are continuous build tools. The daddy is clearly SBT, which while simple is also tremendously sophisticated. Adding a REPL to a build is a simple change but has all kinds of nice consequences, my favourite of which currently is the continuous test (~test) target that detects changes in your source and test files, compiles them and runs your tests. SBT cuts out the whole compile-link-run cycle for you, you just make a change, hit save, see the consequences and code again. It’s very fast and far more effective in giving feedback than any of the current IDEs (all of which need to get on this bandwagon fast is they want to stay relevant).

Clojure by comparison has been suffering in this regard with Leiningen becoming an unfortunately early defacto standard despite it standing shoulder to shoulder with the benighted Maven. The key thing that Leiningen does wrong is stay at the command-line and force you to cold-boot a JVM with each new command (the second is dependency resolution, SBT favours Ivy). Fortunately Lazytest by Stuart Sierra can hopefully save us here. Although still alpha Lazytest is an awesome way of developing Clojure and it’s hard to beat that feeling of smug satisfaction as the tests go green.

It is these kind of step change enhancements in development that are going to carry us forward more than shopping list of features that the new languages have or lack.

Clojure, Java

Metaprogramming with Clojure

So at one of the recent Clojure dojos we had a situation where we had a number of functions to do with moving that essentially involved passing different keys to a map under different symbols that could be used in the REPL.

Well in Ruby this is the kind of thing you would do by metaprogramming abstracting the shared code into a single function that gets mapped under many names.

During the dojo Tom Crayford provided a map function that seemed to correctly generate the functions we wanted but did so under anonymous names. We couldn’t lick the problem during the dojo and it really drove me a little mad as it seemed we were very close. Some people at the dojo were talking about macros but it seemed that was too strong for the short distance we had to cover to complete the task. It also felt like there was an issue with the language if it couldn’t do this.

After four hours, much consulting of the Halloway book and some fierce googling, a question at Stack Overflow put me on the trail on intern. Adding this to Tom’s function resulted in the right functionality. Here’s a simplified example. However when I ran the code into the REPL the functions failed to appear.

Clojure namespaces are very dynamic and it seems they are very amenable to the kind of manipulation I wanted to do. ns-interns gives you a list of the functions that are currently in a namespace so it was possible to confirm that the functions really had failed to appear. Running the code in the REPL did add them though. So the code was correct.

The final piece of the puzzle was the lazy sequence issue that had been mentioned at the dojo. The map function was being immediately evaluated in the REPL but I had to add a doall to make the sequence unwind when it was invoked during the namespace loading.

The relevant code is part of my Clork fork.

The worst thing in trying to make this work is the old school nature of the Clojure documentation. A lot of the functions tells you literally what the function does but not why you might want to use or more critically give examples of the expected usage. Clojure supports documenting metadata but I think it really needs to be used more effectively.

Gadgets

A Kindle and not an iPad?

I had been waiting for the Apple Tablet for a long time. Although I had been mulling the Sony E-Reader for a while there was no way I was going to buy anything until I saw what was behind all the rumours.

Then came the big iPhone… well it may be a sound consumer electronic device but it wasn’t what I was looking for. Namely I wanted to read books and ideally I wanted to recharge less than once a week.

But in the end I didn’t go back to the Sony, instead I ordered a Kindle from across the pond, complete with US power adaptor. So what was I thinking?

Well firstly I know the Kindle is DRM and platform-lock-in bad. Strangely the iPad being based on the iPhone OS and being locked down made this an easier choice. Sony devices are no less proprietary if they are actually the market leader.

In these terms I think consumers actually need to put pressure on both Apple and Amazon to play fair with them. After all Amazon opened up the music market from Apple’s iTunes DRM, hopefully with the advent of the iPad Amazon may have to consider opening up the Kindle to ePub books.

In the end the thing that really convinced me about the Kindle was actually the Kindle for PC software. Being able to download the software and play around with the Amazon ordering system really helped, in fact if they already had a Kindle of OSX I may not have even ordered the physical device because the software actually delivered a reading experience I enjoyed.

Friends had complained about the poor experience of buying books for the Sony reader and as a heavy Amazon consumer already I am already in tune with the Amazon experience and have the necessary accounts.

Since getting the Kindle I have been impressed with the idea of having the books licensed (unfortunately) to an account that is then synchronised between all the Kindles you’re reading. I like being able to read a few pages on the PC and then carry on where I left off on the tube. That’s a nice feature that is a bit unique at the moment (although I am open to the idea of having multiple sets of ebook software).

I bought the smaller Kindle (not the DX) as frankly the iPad’s full colour and touch screen mean that I didn’t fancy paying the same amount for something that is much more limited. It does mean that PDFs don’t have enough screen space and the lack of a zoom is frustrating. However a big plus is the free wireless, paying a subscription for the iPad was unappealing and the ability to go and look things up in Wikipedia from the Kindle is not getting old yet.

Java

Scala and Voldemort: Types matter

So I have been taking a good look at Java object and key-value stores recently. One of the more interesting (because it is very aligned with Java rather than being language-agnostic) is Voldemort. However I got it into my head that it would be a good idea to actually play around a bit using the Scala console. That involved a lot of classpath tweaking, made simpler because the Voldemort distribution comes bundled with its dependencies. However I still rather strangely needed the Voldemort Test jar to be able to access Upper Case View class. I’m not whether I am doing something wrong here or whether the packaging has gone wrong during the build.

With the classpath complete I followed the quickstart instructions and quickly found that actually Scala’s strict type-system was revealing something interesting about the Voldemort implementation. If you translate the Quickstart example to Scala literally then you end up with a Store Client that cannot store anything as it is a Client of type [Nothing, Nothing] and if you supply a value you get an error. I realised that I needed to interact with a lower level version of the API and was surprised to find that the signature of the constructor seemed to need a Resolver instead of defaulting one and that I had to supply a Factory instead of the Config that the Factory consumes.

This is the code I came up with (note that you need to paste this into an interactive Scala REPL rather than putting it in a script). However I’m a nub with Voldemort and I’m not exactly fly with Scala either. Any advice or input is appreciated.

After writing this I was wondering how to try and store arbitrary objects, initially I thought it would be as simple as declaring the Client to be [String, Any] but that failed due to a Class Cast exception in the String Serializer, it looks like the server config comes into play here and says how a value should be serialised and stored. It’s an interesting experience, I have also been looking at Oracle’s BDB but Voldemort puts a much nicer (read Java 1.5) interface on top of it.

Web Applications

The Browser or the App?

There is an interesting little issue occurring in web development at the moment and it is all being caused by the rise in mobile browsing.

The mobile device has always been a bit a challenge (anyone remember WAP?) but until the iPhone it was an issue that was pretty irrelevant. Web browsing on a phone was so painful that no-one did it. Right now if you check your logs most sites are only going to have a tiny amount of iPhone traffic. However if you live in a major city and you own a phone with a decent screen and browser then you are probably aware of how quickly you start to expect to have access to the same level of information you have at home when on the move. Why should I know more about delayed trains in bed than on the platform?

So the technology is finally reaching the point where the consumer is starting to expect to be able to browse the web on mobile devices. In parallel there is the rise of the app. The prime advantage of the app over mobile web browsing is that the app can sensibly cache data locally and therefore provide a degree of offline tolerant behaviour. You can also simplify some of the UI if your user interacts with your site instead of just consuming the content.

So what do you do? Do you invest in creating a mobile version of your site or instead create an app? Obviously if you can afford it you may want to pursue a web/mobile web/iPhone/Android strategy, i.e. do everything. For most companies though the practicalities lean towards trying to have a main website that more or less works with modern mobile browsers (mostly the iPhone). There are a lot of things that go into that like page load but in essence is a “code and hope” strategy where you hammer any nails that stick up after the event.

What’s interesting about this strategy is that HTML5 has some offline capabilities that allow you to provide some offline capability as long as you as you are happy to ignore the non-Opera/Safari/Chrome users (and frankly why not). This means that can stick to pure web development and have a reasonable mobile experience. Adding a JSON API to your site’s content also opens you up to third-parties developing apps for you.

So what kind of reasons should drive you to develop an app? The first reason is really the customised rich user experience, an app should radically simplify accessing your content. For example for a timetable site a mobile app can make it easier to enter and refine queries, defining search options on mobile web tends to be a pain and you often want to save and reuse search settings rather than re-enter them. Sites with user-generated content may also want to have a rich UI to encourage the user to keep supplying material.

An app should also always be trying to cache content when online and pre-empt the user’s needs. For example it makes sense to try and download timetable information and travel updates for locations I use frequently in my search. If I open my timetable search app and the first thing I see is whether there are delays on my route home are and when the next bus or train is going to be then I may not need to do anything else.

Apps also open sophisticated location service options, although location has been broadened to web browsers too your ability to respond to location based information is very limited on your home PC. A reviews website for example has a strong incentive to invest in an app so it can supply location-based reviews.

It is not clear at the moment whether it is more important to develop a mobile web or a mobile application capability. The emergence of capable, enjoyable web browsing on handhelds in an important development, there are many more mobiles than computers. Thinking about how your web content works on those devices is suddenly very relevant. Developing a mobile-capable site is a good defensive strategy but for some businesses being able to enter the mobile app market earlier is going to be much more important as it could potentially form an audience for them that is greater than their current web users.

Echo One

Sequentially arranged sentences composed of words (and punctuation)

Why do you think Riak is a column-store?

Can you use NoSql?

Control expressions are functions too

Declare for simplicity

The Cathedral and the Lemonade Stand

Continuous Testing

Metaprogramming with Clojure

A Kindle and not an iPad?

Scala and Voldemort: Types matter

The Browser or the App?