Programming, Python

Django and JSON stores: a match in heaven

My current project is using CouchDB as its store and Django to implement the web frontend. When you have a JSON store such as CouchDB then Python is a natural complement due to its brilliant parsing of JSON into native data structures and its powerful dictionary data type that makes it really easy to work with maps.

In a previous project using Python and Mongo we used Presentation objects to provide domain logic on top of the raw data maps but this time around I wanted to try and cut out a layer and just work with maps as much as possible (perhaps the influence of the Clojure programming I’ve been doing on the side).

However this still leaves two problems that need addressing. Firstly Django templates generally handle dictionaries like a dream allowing to address them with the standard dot syntax. However both Mongo and Couch use leading underscores to indicate “special” variables and this clashes with the Python convention of having a leading underscore indicate a private member of the class. The most immediate time you encounter this is when you want to use the id of a document in a url and the naive doc._id does not work.

The other problem is the issue of legitimate domain logic. In Wazoku we want to use people’s names if they have supplied them and fallback to their email if they haven’t supplied their name.

The answer to both of these problems (without resorting to an intermediary object) is Django’s filters. The necessary logic can be written in a few lines of Python that simply examines the dictionary and does the necessary transformation to derive the id or user’s name. This is much lighter than the corresponding Presentation solution.

Programming

Thoughts on Neo4J REST server 1.3

So the new Neo4J GPL release 1.3 (with added REST) is the release that kind of moves Neo4J out of its niche and into the wider arena. There are less barriers to experimenting with it in production situations and now it is no longer necessary to embed it you don’t have any language barriers, it can just be another piece of infrastructure.

Last week I used the new release to prototype an application and these are my observations. The big thing is that the REST server is easy to use out of the box and the API is well-thought out and practical and it looks like there’s some nice wrappers already for various languages (I used this one for Python).

However compared to other web-based stores there are still a few wrinkles. The first is that it would be nice to clearly be able to mark nodes that you consider “roots” in the graph. It’s almost the first thing you want to do and there should be a facility to support it in the API.

Indexing is still cronky, I think there is a reasonable expectation that Neo4J should be able to have a dynamic index that indexes the properties of nodes as they change without having to explicitly add the indexable fields to the index. I don’t mind if I need to do some explicit work to switch on this mode but having to manage it all explicitly is a huge pain. Particularly because you don’t have any root node support so you are almost inevitably going to want to search before moving over to relationship following.

It would be nice to have list and map properties on nodes, at the moment we’re getting around that by using JSON in the property string but again it feels like a value add that’s necessary if you want to use Neo4J for a big chunk of your data processing.

This is a personal peeve but I do think it would be nice to be able to override the id of a node. One usecase I can definitely think of is when you’re translating a document database into a graph form and you want to cross-reference the two (particularly if you’re using the richer data on the document rather than tranferring it into the node).

One other feature that is a bit strange (but I might be doing this wrong) is that the server seems to correspond to just one database. It would seem pretty easy to incorporate a database-style data-partition name into the REST scheme and allow one server to serve many different databases.

Now that the GPL version is available I am also looking forward to Neo4J entering the Linux package distributions in the same way that CouchDB has become a ubiquitous deployment choice.

So in short Neo4J has left it’s niche but it still hasn’t fully left the ghetto but I have high hopes for the next release.

Programming

Many cores, many threads?

There are roughly two approaches to solving the multicore problem: better ways of managing threads and making multiple processes easier to invoke and manage. There are a few pros and cons, threads for example are going to easier to work with if you are managing shared data or you have expensive resource creation costs that you would prefer to confine to a single initialisation event rather than going through a version of it for every job you want to create.

However if you can make your job a parallel process, avoid the need for co-ordination and have a lightweight spin-up then there is actually an interesting question as to whether you should favour processes rather than threads to boost throughput.

Recently I have been working a lot with various REST-like webservices and these have some interesting idempotent and horizontally scalable properties. Essentially you can partition the data and at that point you can execute the HTTP requests in parallel.

I’ve used two techniques for doing this, the most general is the use of GNU Parallel which is an awesome tool (which I would recommend you incorporate into your general shell use) combined with Curl which is also awesome. The other is using Python’s multiprocessing library, which was very helpful when I actually wanted to use a datasource like MongoDB to generate my initial dataset. Even here I could make a system call to Curl if I didn’t want to use Python’s Httplib.

So why use processes to do parallel work rather than threads? The first answer is very UNIX-specific in that processes actually get a tremendous amount of OS support for managing and running, compared to tools for analysing thread activity and usage.

The second is conceptual, the creation of a process to do a task is involves a simple lifecycle and the separation between the resources used by processes is absolute compared to the case with threads. Once a process has gone you know it has completed either successfully or unsuccessfully and upon completion that process no longer affects any other running process.

The third is practical, I think it is easier to construct pipelines of work that divide into parallel streams through a pipeline approach rather than trying to put that code into a thread manager.

So if the conditions apply take a good look at the multi-process approach because it might be a lot easier to implement.

Programming

Intuitive versus Reasoning Programmers

During the last year I’ve been helping run a monthly series of dojos for the London Clojure User Group. In the course of it I have had the chance to watch a lot of people grapple with functional programming. As a result of this and also looking at the way a lot of my colleagues at ThoughtWorks work I think programmers can roughly be divided into two groups: Intuitive and Reasoning.

To characterise both of them a little, I think that Intuitive programmers tend to use domain language a lot, rely heavily on tests and TDD, often find it difficult to articulate what they are doing in their work, they like small source code files because they like to see everything a file does at a glance, they prefer outside-in problem solving and like the opportunity to go back and revise their work.

Reasoning programmers like to discuss a problem before coding and explore edge-cases and what-ifs, they like to work in the REPL or have in-editor code evaluation, they quickly move from a domain to an abstraction and then work in that abstraction, they don’t mind a thousand line source file as long as it is logically structured (that’s what the search function is there for), they prefer “bottom-up” coding where they distil their abstraction to its essence and having implemented that create the required behaviour by composing their abstractions.

Both sets of programmers can be great at their work but often if they are unaware of the characteristics of how they like to work then there can be massive amounts of tension as the two wrestle back and forth. The Intuitive developers feel angsty when the Reasoning programmers start hacking out code rather than refactoring, Reasoning developers get frustrated at the aimlessness and game playing of the Intuitive’s attempts to generate the minimum amount to make their tests pass.

Reasoning programmers probably feel that in terms of communicating they have the superior style as they are able to advance arguments and logical constructs that can be interrogated. Those articulated models though can feel arid and irrelevant to Intuitive programmers, what does some obscure mathematical formula have to do with trying to make sure the frog doesn’t get run over when it crosses the road?

Intuitive programmers seem to be better at switching contexts and adapting to change as they can quickly see the “outlines” of any problem and their techniques are about honing that initial perception into a functional solution. By contrast a Reasoning program is wary of uncertainty and is unhappy drawing unjustified analogies between different situations.

In FP terms Reasoning programmers have their behaviour emerge as a logical consequence of the operation of lower level abstractions; their code looks like algebra and domain data is passed in to the top of their processing chain. Intuitive programmers on the other hand, fix behaviour in their tests and fill their code with domain language that aims to match the natural language of the organisation, the depth of their function calls is usually smaller and they are swifter to bind calculations to intermediate variables.

I think I am an Intuitive programmer (and therefore I worry that I am miscasting Reasoning programmers through my lack of understanding) and in my work most of my colleagues are as well, it is in the nature of consultancy to have to adapt to constantly changing domains, code bases and expectations. We do have a few Reasoning programmers though who do exactly the same work but do it via thinking deeply through problems and drawing logical inferences.

If there are issues between the two groups then I think it occurs when there are no concessions between the two; common flashpoints being testing, writing defensive/guard code and giving time to discuss problems and problem solving strategies. The two also agree on a lot of things for very different reasons: for example both like to rewrite code, refactoring is a formalised practice for Intuitive developers whereas Reasoning developers often want to apply new insight or learned ideas (“This could all just be monads!”).

An important point is that neither approach is “right” both types of programmer arrive at the same results if their experience, ability and other factors are equal. They are purely styles of working.

culture

Is this hardcore?

The UK mobile operators have been indulging themselves in a bit of commercial skulduggery recently in the name of “protecting the children”. I recently bought an Orange data SIM and was surprised to see that access to a games discussion site was blocked due to its “extreme violence or pornography”. Of course as with any half-arsed censorship strategy it’s still perfectly possible to access hate, pr0n and violent sites. The reasoning behind this protection becomes apparent when you try and get it removed from your SIM. O2 wants moniez, Orange wants you to register your phone and give them your name, address and other valuable demographic and advertising information (ironically because I want to pay them by credit card they already know this information but their information strategy is so screwed they can’t relate your various pieces of information). Talking to other people this censorship also affects contract customers (the justification being that like cigarettes and alcohol irresponsible adults might be tempted to pass on unlocked interwebs to children) which is the ultimate in crazy.

I’m not sure where to start on how stupid all of this is. So let’s start with age-restrictions. At 16 you are allowed to fuck who you want but reading about fucking on a website via your phone would, in the view of the mobile networks, destroy your soul.

Next how about an opt-in rather than opt-out? The number of children with data phones whose parents wish to restrict access must surely be the minority number of customers. Let concerned parents be responsible for their children rather than make every adult in UK be part of their anxiety trip.

Censorship is always bad and website blocking is particularly bad because what gets blocked are legitimate but difficult sites. It is Lady Chatterley’s Lover that gets hit by censorship not Cum Sluts Vol. 4. Extreme sites can duck and dive away from any blacklist.

Finally if this is about protecting the vulnerable there should be no opportunity for operators to commercially benefit from it. If I present myself, my id and my phone at an operator’s store then the block should be removed right there and then. No charging of credit card, no details taken.

I wanted to complain about this to someone but its one of things where no-one seems to be responsible for this. The operators and regulators all seem to be hiding behind one another and the fig leaf, of course, of protecting children.

Clojure

Dojos: One conversation in the room not in the driving seats

Bruce Durling and I had a conversational meeting/talk last week about doing a Clojure dojo here in London for the last year. One of the attendees was Nicholas Tollervey the organiser of the Python dojo. In the course of discussing formats there was an interesting discussion about the concept of “one conversation”, which is to say that people stand a better chance of understanding something is there is just one conversation going where everyone is contributed to the shared understanding rather than several conversations where nothing may be getting understood. Nicholas had read the original Paris dojos and interpreted their rules as saying that the conversation is entirely owned by the pilot and co-pilot and that the audience cannot contribute.

My understanding (this may be a Brazilian addition to the Paris method) and the way I would practice dojos is that all the participants are in the conversation but that once a point or question has been raised by the audience then that issue must be addressed before the coding pair continue. So for example if an audience member doesn’t understand a piece of code or a function they should ask what it means and then the coding pair pause and an explanation is explored until the person asking the question feels they understand what is happening. The coding pair then resume.

With that said there are some important points of concern in terms of facilitation. Firstly the coding pair should be shielded from barracking, “Why are you doing it like that?” can be a good a question but is rarely helpful. A better question would be “Are you trying to solve the problem with method A? Would method B be better?”. But the best type of this question is not asking it, there may be a better way to do something but watching people explore an alternative route might be informative in its own right, even if it ultimately confirms your own views.

Questions should be “points of order”, ideally based on facts not opinion and aimed at clarifying understanding. Philosophical points of view are best expressed at natural break points in the coding or down the pub.

Once a question has been asked the focus of the conversation moves to the person asking the question. This often means that the person asking the question feels a tremendous amount of pressure to say that they understand something and allow the focus of conversation to move on. I know I have often been frustrated and embarrassed with myself for not getting a point. However if you don’t really understand something it is important to say so as it is likely that other people in the room feel the same way and that those trying to furnish an explanation have not done so satisfactorily and need to try again. Facilitators need to keep the safety up here if the group struggles to be supportive.

A final point is that every once in a while you get the really good question. The one that is either going to take a massive amount of effort to answer, that is fundamental to the problem or that lacks a definitive answer. Some people regard these questions as rabbit holes and while it is true that you can often kiss the coding goodbye for the rest of the night it is often these questions that lead to the most memorable moments of a dojo. I remember this happening when Ola Bini gave a spontaneous lecture on Ruby’s object lifecycle event hooks. It started with someone using an unusual technique and it ended up being a really enlightening trip through the guts of Ruby.

Dojos are primarily about learning and these side trips are as important as powering down the highway.

ThoughtWorks, Work

Why didn’t I get into ThoughtWorks?

I had an interesting conversation with a recruiter recently about the difficulty of the ThoughtWorks recruitment process. It’s true that the majority of applicants don’t make it through, but then that it is true of most recruitment processes. The ThoughtWorks one is perhaps more drawn out and longer so you feel that more effort was invested if you don’t go through.

One thing that is quite interesting is that being really smart is not the only criteria if you want to work at ThoughtWorks. Candidates who fail the process often go on to join other very prestigious companies. Why did you turn down this person who went to this other company?

In most cases the answer is that the company they ended up joining was not a consultancy. Consultancy is, unfortunately perhaps, not just about raw smarts and programming ability. Unlike a lot of companies ThoughtWorks doesn’t really have a codebase, the job is mostly about working with and helping people improve their own codebases.

Coaching, persuading and analysing code is quite different from writing good code. My favourite example of this is a code submission that solved all three of our problems in probably one of the shortest sets of solutions I’ve seen. The candidate correctly identified each underlying abstraction and applied the standard solutions in a concise powerful way.

An automatic hire then? No, sadly not, because while the candidate had the time to solve all three problems they didn’t bother to write a single test. If you look at what they did they really just proved their own superiority and left nothing that helped explain, maintain or develop their code base. You could already see that this person wanted to be the hero of the piece, not the collaborator.

In the three years I’ve been at ThoughtWorks there have been several times when I could have produced a much better solution to problems than what was actually created in collaboration with my clients. I would like to think that instead of the best solution we were able to produce something that our client teams understood better than what they had previously, was more productive that what they had previously and was more reliable that what they had previously.

All these things are more valuable than an individual being a better coder than their peers.

Scala

Using function passing to separate concerns

So I had a difficult problem today where I needed to access a service but I couldn’t inject it due to the service having slightly too many concerns and hence a circular dependency in the Dependency Injection. So there is right way of dealing with this, which is to refactor the concerns of the collaborator until the dependency goes away. Unfortunately there is also the incomplete user journey that is holding everyone up unless a quick hacky fix is found.

Enter function passing to the rescue! In my case we are talking about an MVC web application where the Model is generating the circular dependency but it is possible to inject the collaborator into the Controller. The trouble is that my code separates the concerns of the Controller and the Model such that the Controller asks the Model to perform an operation and the Model returns the result of executing the operation. This is clean to develop and read but it also means that the Controller cannot access any of the internal operations of the Model, which is what I need for the invocation of the collaborator.

For a moment I thought I had to break the encapsulation between the two and pass the internal information of the Model operation out with the operation result, however function passing is a more elegant solution that keeps my concerns separate. Essentially the Controller will ask the Model to do something and also provide a function for the Model to execute if it wishes to invoke the collaborator.

Let’s try to illustrate this in code


class Controller (stuff: StuffRepository, messages : MessageDispatcher) {

post("/handle/stuff/:stuffId") {

def messageCallback(id: StuffId, message : Message) { messages.send(Messages.StuffUpdated, message, Some(id)) }

stuff.storeChange(params("stuffId"), messageCallback) }}

class StuffRepository {

def storeChange(id : String, callback : (StuffId, Message) => Unit) = {

makeChanges(id) match {

case Success => callback(StuffId(id), Message("Changes completed")) ; Success

case Failure => Incomplete }}

Hopefully that’s all relatively obvious, you can also type the callback for clarity and make it an Option if you don’t always want the callback invoked. If you don’t like the closure you can probably re-write is as a function that partially applies MessageDispatcher and then returns the function.

Some of you are going to be thinking this is totally obvious (particularly if you a Javascript fiend) and that’s cool but I do think it is an interesting technique for keeping responsibilities separated without very complex patterns. It is also something that is only possible with proper first order functions.

Scala

Scala can be the replacement for Enterprise Java

Last week I had the chance to take a look at one of my worse fears about Scala: really bad Scala code. As I thought it might it did include a partial application, I was very grateful it didn’t abuse apply. It also managed to bring in a lot of classic bad behaviour like catching Exception and then swallowing it.

It has always been one of my fears that when someone went rogue with Scala the resulting mess would be so bad it would be a kind of code Chernobyl, its fearsome incomprehensibleness would be like the Lift code only without the brevity.

When I finally had to confront it I thought: this is actually no worse than really bad Java code.

I know that it could potentially be worse but actually it would take someone who was actually really good at Scala to construct the nightmare-ish worst case scenario I had imagined. In most cases terrible Scala code is going to look a lot like terrible Java code.

So that is really my last concern out of the way. We might as well use Scala for the good things it offers like immutability and first-order functions. The boost to productivity is going to outweigh the potential crap that might also be generated.

Web Applications

Replicating data in Cloudant and Heroku

Heroku allows you to use CouchDB via the Cloudant cloud service which is great but compared to the documentation for the relational stores it is not clear how you are meant to deal with backups and importing of data. I also couldn’t find a way to use Futon on the Heroku instance (which comes from the Heroku account, you can’t use your own Cloudant account with the plugin) or share the database instance with my personal Cloudant account.

This post from Cloudant helps a lot, essentially you can get your Heroku instance URL and then the cool thing about Couch’s painless replication is that once you have a Couch URL you can replicate that database to a local instance or even back into Cloudant.

heroku config --long

curl CLOUDANT_URL/_replicate -H 'Content-Type: application/json' -d '{"source" : "CLOUDANT_URL", "target" : "TARGET_URL"}'

You can edit the database locally and then replicate back to the Heroku instance by just swapping the URLs in the Curl above.

That seems to pretty much be it. I’ve replicated my data out of Cloudant and then back into it, which feels bizarre but it’s all symmetrical with Couch and it’s a handy cloud-based backup mechanism.

Echo One

Sequentially arranged sentences composed of words (and punctuation)

Django and JSON stores: a match in heaven

Thoughts on Neo4J REST server 1.3

Many cores, many threads?

Intuitive versus Reasoning Programmers

Is this hardcore?

Dojos: One conversation in the room not in the driving seats

Why didn’t I get into ThoughtWorks?

Using function passing to separate concerns

Scala can be the replacement for Enterprise Java

Replicating data in Cloudant and Heroku