Java

The dark side of the entrepeneur

The recent disgrace of Peter Cruddas is particularly interesting to me as a way of illustrating the dangers of bring the vaunted “skills” of the the private sector into public life. I actually feel sorry for Cruddas, I consulted at his company once and met him at a project conclusion meeting. When he talks about “bluster” and the fact that he was not in a position to deliver what he was claiming to be able to I actually believe him.

A lot of “enterprising” sorts (not just the ones born in the East End) are in the habit of “over-promising” what they can deliver. Less politely, a lot of them lie about their achievements, support and capabilities. Partly out of insecurity, like Cruddas, partly out of habit, partly out of ego. The thing is that in their natural environment they mix mostly with people who are doing the same thing and are never subject to any kind of meaningful scrutiny. Everyone can swim along in their happy bubble of hyperbole.

Stepping outside of that cosy world into the more brutal and unforgiving political one is fraught with dangers. Talking a load of shit to an investor or customer is unlikely to result in the conversation being plastered over the Internet. In fact often these conversations are private because of the collusion of the listeners, everyone is after something after all.

Entrepreneurs often regard journalists and politicians with scorn and they often find it hard to break the swaggering, boastful exaggerating habits of their private world. Working in public life requires a very different approach, despite the best efforts of vested interests public business is a lot more transparent than the private sector.

Private individuals have a lot to offer the state but I suspect that direct employment is the worst way to access those abilities. Contracting deliverable products would seem far more sensible. Advice is far better taken for the price of a coffee than a “premier league” donation.

Programming

Keep the focus on the read

One of the interesting things about Wazoku‘s Startup Challenge app is that a lot of the functionality is created via “out of the box” CouchDB features. In fact it is often where we haven’t lent heavily on the features of our store and frameworks where we have issues.

One of the interesting things we decided to do in the app relatively late in the day was provide little encouragements to say how many more votes an entry needed to get to the next place in the ladder. As this was a late feature we didn’t really think through where this feature would sit. We had code that re-ranks entries when their vote ordering changes and so when an entry was being re-ranked it also acquired the target to beat at the same time.

With a store like CouchDB you are really aiming to keep on reading data and minimising writes. That’s via denormalisation and also about strategies to generate related and derived data when you are changing the parent data.

So this placement made sense from that point of view. It was only later that I have begun to realise that we were choosing the wrong point to read. With hindsight it is actually only necessary to calculate the target entry when someone looks at the entry. This is because the views of the entries are distributed unequally and the vote totals already exist as a CouchDB view and therefore we can do a key lookup to find all entries with more votes than the current entry when needed.

If we wanted to cache that result to avoid needless recalculation we would be better off storing the information in front-side cache like Memcached or Redis but in practice key reads in CouchDB are pretty damn fast and low load.

So we thought we were saving ourselves problems by denormalising derived data but in fact we were creating a lot more work at a point where it is uncertain that the additional data will ever be consumed.

Sometimes it can be hard to pick the right point to read!

Web Applications

Good magic, bad magic

Philip Potter pinged me his post on Sinatra magic during the week. Mark Needham’s comment and code on solving the mocking problem is good advice to the problem as posed.

At Wazoku where we use the often equally magical Bottle framework we don’t use top-down TDD but instead outside-in functional tests (with no funky runners as we don’t need CI). This solves the whole magic issue by shifting the attention to what the public interactions of the application are. This is one of the massive benefits of using a microapp HTTP/JSON/REST-like architecture. I could flip the API from Bottle to Django or Compojure or Sinatra and my test suite can keep on rocking and telling me whether the behaviour my consumers are relying on is correct.

The major thing I felt when reading through Philip’s post was the massive amount of effort that was going into testing relatively simple behaviour. This is a bit of anti-pattern with Agile developers (or perhaps it is part of the mastery thing where rote “correct” behaviour is modified by experience and judgement). One of the massive advantages of using something like Sinatra is that you can get a whole web app with rich behaviour into less than 200 lines. If you then create thousands of lines of test code and battle with the magic for hours on end you’ve completely destroyed your productivity.

If you have a code base that you expect to be large and highly contested by a large development team you need good, layered testing and to use frameworks that support that. If you have an app that is small and when its done it is done then there is no need to agonise as to whether it was done “right”.

The idea that top-down TDD is the only correct way to write software is corrosive. When faced with a generally poorly skilled and educated workforce it is good to have rules. I have imposed a certain style of TDD on a group myself because it gives a good framework for work and achieves very consistent output.

However with skilled people on small scale projects you can kill yourself by imposing arbitrary rules. I love Sinatra and while I might be equivocal about magic I think it is ridiculous to moan about it if you are using something as unicorn-packed as Ruby. For example Philip was trying to use RSpec mocks and stubs to do his TDD. The result is kind of saying that you’re disappointed that your “good” magic for testing didn’t work with the “bad” magic of a DSL for web applications. Even if your RSpec code passed its tests you still haven’t said anything particularly deep about the production behaviour of your application as your unit testing environment was severely compromised by the manipulations of your mocking framework.

So my rule of thumb is: if its simple, do it; if it was simple, functionally test it; if it was never really simple then test-drive it with suitable tools.

Clojure, Programming

January’s London Clojure Dojo

January meant Battleships. More specifically battling battleships. Five teams created players and duked it out during the dojo with a tremendously narrow margin of victory. So what did we learn?

Well first of all randomly placing ships and shooting is actually a pretty good strategy. This is what the default player does and any deviation from it can be pretty badly punished by it.

One simple thing that people did to start improving over the random start was restricting placement of ships to a single half or quarter of the board. Doing this allowed most teams to start beating the initial strategy.

However clustering your ships is only effective against random shot placement so when people start implementing targeting you actually become more vulnerable. The first effective targeting strategy was surprisingly simple, if you hit something choose an adjacent square as your next target.

The team that squeezed to the top refined this by choosing an adjacent square that hadn’t already been fired at. The next level of improvement would probably be a non-trivial look at the probability that another ship square lay in the adjacent squares by looking at the information surrounding them.

There was a lot of work around the concepts of adjacency and whether the square had been fired at and the teams all seemed to converge towards the clojure.set library (if they were aware of it).

I’m now thinking of what fiendish problem would force and exploration of this library as it seems incredibly powerful for all different kinds of problems.

Programming, Software, Web Applications, Work

Names are like genders

One thing I slightly regret in the data modelling that is done for users in Wazoku is that I bowed to marketing pressure and “conventional wisdom” and created a pair of first and last name fields. If gender is a text field then how much more so is the unique indicator of identity that is a name?

The primary driver for the split was so that email communications could start “Hey Joe” rather than “Hey Joe Porridge Oats McGyvarri-Billy-Spaulding”. Interestingly as it turns out this is definitely the minority usage case and 95% of the time we actually put our fields back together to form a single string because we are displaying the name to someone other than the user. It would have been much easier to have a single name field and then extract the first “word” from the string for the rare case that we want to try and informally greet the user.

My more general lesson is that wherever I (or we more generally as a business) have tried to pre-empt the structure of a data entity we have generally gotten it wrong, however so far we have not had to turn a free text field into a stricter structure.

culture

Why I’m sticking with Diaspora

Diaspora’s Kickstarter crowdfunded kickoff has led from euphoric hype to snarky unhappiness, the emotional highs and lows of which really have had nothing to do the product and the proposal but actually the perception and anticipation of a social network that would finally be right for everyone.

I use Diaspora I recently contributed again to Diaspora to help fund the next phase of development. Diaspora feels right for me for the following reasons…

A customer not an audience

It has a clear funding model, it allows you to be a customer of service rather than an audience for advertising or a source of demographic data. This isn’t a minor thing, it is actually a unique feature. Whether it is sustainable or not will be seen. Will people value a social network in the way they do Wikipedia? My feeling is that certain people do and others might and that could be enough to fund the network for everyone.

It acknowledges the primacy of the user as the creator of content

The other social networks allow you to extract your content to some extent but Diaspora correctly puts the user and the content they create centrally and makes it straightforward to extract and use yourself. The ability to federate and even pull your content and publishing entirely under your control should you wish to clearly goes further than any provider today.

It returns control to the user

It allows you to put some measure of control back on your online social life. Although this has now gone more mainstream with things like Google’s Circles Diaspora was the first to properly implement it and go through the real-world feedback loop. Diaspora’s Aspects allow to segment your network by audience and interest. They are a surprisingly powerful tool.

Is this enough?

Diaspora may not succeed, network effects rely almost entirely on volume of users and therefore it is critical that Diaspora has just enough use that there is some kind of feedback loop and you do not feel like everything you are doing is just being fired off into a void. However it does not have to be as successful as Google+ or Facebook to succeed in providing a valuable service to those who have concerns about control and trust.

Programming

How do I query data with CouchDB?

This question comes up a lot when dealing with Couch and I have given various answers before but my latest answer is simply that you don’t. In reality what you want to do in Couch, like a lot of the NoSql databases, is look for key lookups.

Now the key lookup may be a range of keys you are interested in but in reality there is nothing in Couch that is similar to the SQL “WHERE” clause.

So if you cannot do queries then how do you relate data? Well that’s the thing about storing documents instead of rows, if you have related data then you have to ask whether that data has any meaningful existence outside of its parent. In relational terms it is like asking whether you ever access the content of table outside of JOIN with its parent.

Initially you might think: of course I do! But often data is often explicitly related to its parent’s primary key by things like ORDER and GROUP BY. In these kind of cases then you move the related data into the parent record, effectively denormalising to avoid a lookup.

If the data does have a meaningful existence outside the parent (for example in Wazoku comments are an example of a piece of data that exists separately from the thing they are a comment on) then you have a few options but essentially instead of querying you are still trying to do a direct key lookup.

The first simple case is to include a reference to key of the related data in the associated document. Then from one key lookup you can go direct to the next. As an example we store a list of comment document ids on any document that can be commented on and then we can load the comments as needed (often the count of the comments can be as relevant as the full content). I describe the ids used this way as “forward references” as they lead you on to the related document.

The second, slightly more involved approach, is the creation of a view that allows the document to be looked up via an alternative key. For example if we store the document id of the thing being commented on in the comment document under the key comment_on we can then create a mapping view of all comment documents to their comment_on key. Then given any document we can simply do a direct lookup on the key in the view to determine whether it has any associated comments.

The final common technique I use is something I refer to as “unrolling” of collections. So again we create a CouchDB view that consists just of a map job and in it we take each item in an array of “forward references” (related document ids) and emit a document in each view mapping the id to the current document id.

So if an idea document has five comment forward references the resulting view will have five documents, each relating a comment document id to the idea document id.

If things get more complicated then I also have the Couch databases indexed in Elasticsearch and in Neo4J and these alternative views of the data give me powerful adhoc queries on properties or relationships in the data.

In general though I am always trying to think ahead as to how my documents relate and then express that in terms of a key lookup so that I am always working with the simplest case.

Programming, Python

How does the patch decorator in Mock work?

I tend to use Mock more as a stubbing library rather than for mocking. The patch decorator is pretty handy in terms of this as it takes care of all the resetting once your stubbed test has run making it easy to have a test where a dependency returns an empty list, followed by a single-entry list and so on.

However I often forget how exactly it works so I’ve decided to write up my latest remembering of how to do this (via John Hartley’s help and reminders) so I have something to look up next time I forget.

The first thing is that the patch decorator takes a string that represents the fully qualified name of the stub/mock you want to create. In a Django app for example that means you should include the app name at the root. The name also reflects the local name of an imported item. Something I commonly do wrong is to bind to the absolute name, say ‘random.choice’ rather than ‘myapp.mymodule.random.choice’. If you are in the situation where your stub is correct when you call it directly but never happens when you run the code under test I am pretty sure that naming will be at the root of your problems 95% of the time.

For each string argument you have in patch you also need to define a parameter to the test function, this will contain the actual Mock object and is what you use to actually stub the value to what you want it to be for the test. Use names that make sense here, stub_db, fake_file_reader not just mock1, mock2 and so on.

With these relatively few reminders in place you should now be in a position to stub simply with Mock!

Java

Bottle on Epio

As WSGI-based framework you can get Bottle running on ep.io. However it isn’t part of the documentation as yet.

The basic setup is an app.py, requirements.txt and the epio.ini. Requirements obviously just has bottle.

Epio.ini

[wsgi]

entrypoint = app:app

requirements = requirements.txt

The app.py file should be:

import bottle

app = bottle.app()

@bottle.route('/')
def home():
  return {"message" : "Hello world"}

That should give you a basic JSON service running quickly.

Programming

Elasticsearch “More like this” example

Elasticsearch is an amazing tool but the documentation does not always give that much help and advice on how to get going with it. Today was more_like_this (or mlt for sure) day so I thought I’d give a “get going” example. It isn’t that complex except that the default settings are likely to return no results if you have a small data set. That’s why here I have the minimum values turned down to one, so that if there is any match you will get some results. Once you know you have a working query you can then start to turn the requirements back up or to the defaults.

{
  "query" : {
    "more_like_this" : {
      "like_text" : "testing",
      "min_term_freq" : 1,
      "min_doc_freq" : 1
    }
  }
}

Echo One

Sequentially arranged sentences composed of words (and punctuation)

The dark side of the entrepeneur

Keep the focus on the read

Good magic, bad magic

January’s London Clojure Dojo

Names are like genders

How do I query data with CouchDB?

How does the patch decorator in Mock work?

Bottle on Epio

Elasticsearch “More like this” example