Programming, Software, Web Applications, Work

Names are like genders

One thing I slightly regret in the data modelling that is done for users in Wazoku is that I bowed to marketing pressure and “conventional wisdom” and created a pair of first and last name fields. If gender is a text field then how much more so is the unique indicator of identity that is a name?

The primary driver for the split was so that email communications could start “Hey Joe” rather than “Hey Joe Porridge Oats McGyvarri-Billy-Spaulding”. Interestingly as it turns out this is definitely the minority usage case and 95% of the time we actually put our fields back together to form a single string because we are displaying the name to someone other than the user. It would have been much easier to have a single name field and then extract the first “word” from the string for the rare case that we want to try and informally greet the user.

My more general lesson is that wherever I (or we more generally as a business) have tried to pre-empt the structure of a data entity we have generally gotten it wrong, however so far we have not had to turn a free text field into a stricter structure.

Software, Work

Comparing Jabber servers

I recently had a trawl around the available Jabber servers looking for something that was suitable for use as a messaging system for a website. My first job was to do a quick review of what is available out there. The first thing that was quite clear that is ejabberd has massive mindshare. There was a definite feeling of “why would you want to try other servers when you could just be running ejabberd?”.

Well there are kind of two answers; first there is inevitable Erlang objection. This time from sysops who felt uncomfortable with monitoring and support. It is a fair point, I feel Erlang can be particularly obtuse when it’s failing. The second was that ejabberd stubbornly refused to start up on my Fedora test box. Neither the yum copy nor a hand-built version cut the mustard. Ironically I was able to get an instance running in minutes on my own Ubuntu-based virtual machine (hand-built Erlang and ejabberd).

Looking a JVM-based alternatives I looked at OpenFire and Tigase. Tigase has lovely imagery but also seemed to have spam over its comments and the installation was a pig that I gave up on quickly. OpenFire is one of those old-school Java webapps where you are meant to manage everything via a web gui. This makes some kinds of  tasks easy but you have to write your own plugins to get programatic access to the server. I didn’t want to setup up an external database (why on earth would I want to do that?) but the internal HSQL store left me with no way of easily tweaking the setup of the server, a command-line tool would have been ultra-helpful because… OpenFire is annoyingly buggy, by this I mean it doesn’t have a lot of bugs but they are really annoying. After running through the gui setup process I tried to login. This was the wrong thing to do. What I needed to do was stop the server and restart it. Now the server was fucked and I needed to manually delete data and tweak config files to allow me to do the setup again.

Once it was running there was also some fun and games getting the BOSH endpoint to work. Do you need a trailing slash or not? I don’t remember but get it wrong and it doesn’t work. If you try to HTTP GET the endpoint then you get an error, this is technically correct but leaves you wondering whether your config is correct or not (if you know the error message you are looking for perhaps it is helpful but I didn’t so it wasn’t). ejabberd (when I got it working) is more helpful in giving an OHAI message that at least confirms you have something to point the client at.

OpenFire did what I needed but seems to make an implicit assumption that there is going to be an admin working at screens to configure and monitor everything. It feels more like a workgroup tool than a workhorse piece of infrastructure.

One oddity I found was Prosody, it sensibly used the same defaults, urls and conventions as ejabberd but is written in Lua and was gloriously lightweight. It absolutely hit the spot for development work and actually felt fun. I was able to script everything I needed to do via its ctl script. Of course if Erlang (a big, serious infrastructure language) is a bit of an issue the hipster scripting language might be too.

What the hell, if it was about doing what I need to do quickly and without fuss I would use Prosody and worry about any infrastructure issues later.

Software, Work

The Joy of Acceptance Testing: Is my bug fixed yet?

Here’s a question that should be a blast from the past: “Is my bug fixed yet?”.

I don’t know, is your acceptance test for the bug passing yet?

Acceptance tests are often sold as being the way that stakeholders know that their signed-off feature is not going to regress during iterative development. That’s true, they probably do that. What they also do though is dramatically improve communication between developers and testers. Instead of having to faf around with bug tracking, commit comment buggery and build artifact change lists you can have your test runner of choice tell you the current status of all the bugs as well as the features.

Running acceptance tests is one example where keeping the build green is not mandatory. This creates a need for a slightly more complicated build result visualisation. I like to see a simple bar split into green and red with the number of failing tests. There may be a good day or two when that bar is totally green but in general your test team should be way ahead of the developers so the red bar represents the technical deficit you are running at any given moment.

If it starts to grow you have a prompt to decide whether you need to change your priorities or developer quality bar. Asking whether a bug has been fixed or when the fix will be delivered are the wrong questions. For me the right questions are: should we fix it and how important is it?

If we are going to fix a bug we should have an acceptance test for it and its importance indicates what time frame that test should pass in.

Software, Work

Agile must be destroyed

Let’s stop talking about the post-Agile world and start creating it.

Agile must be destroyed not because it has failed, not because we want to return to Waterfall or to the unstructured, failing, flailing projects of the past but because it’s maxims, principles and fixed ideas have started to strangle new ideas. All we need to take with us are the four principles of the Agile Manifesto. Everything else are just techniques that were better than what preceded them but are not the final answer to our problems.

There are new ideas that help us to think about our problems: a philosophy of flowing and pulling (but not the training courses and books that purport to solve our problems), the principles of software craftmanship, “constellation” architecture, principles of Value that put customers first not our clients. There are many more that I don’t know yet.

We need to adopt them, experiment with them, change and adapt them. And when they too ossify into unchallengeable truisms we will need to destroy them too.


Fine grained access control is a waste of time

One of the things I hate developing most in the world (there are many others) is fine grained control systems. The kind of thing where you have to set option customer.view_customer.customer_delivery_options.changes.change_customer_home_delivery_flag to true if you want a role to be able to click a checkbox on and off.

There are two main reasons for this:

  • Early in my career I helped implement a fine grained system, it took a lot of effort and time. It was never used because configuring the options were too difficult and time consuming. Essentially the system was switched to always be on.
  • Secondly, when working in a small company I discovered that people that do the job of dealing with customers, buying stock or arranging short term finance actually did a better job when the IT department didn’t control how they did they worked. Having IT implement “controls” on their systems is like selling a screwdriver that only allows you to turn it in one direction.

Therefore I was very happy to hear Cyndi Mitchell on Thursday talking about the decision not to implement fine level ACL in Mingle. If you record who did what on the system and you make it possible to recover previous revisions of data then you do not need control at level much finer than user and superuser.

Instead you can encourage people to use your system as a business tool and if they decide to use that screwdriver to open paint tins or jam open doors, then good on them.

Programming, Software

The One True Layer Anti-Pattern

A common SQL database anti-pattern is the One True Lookup Table (OTLT). Though laughable the same anti-pattern often occurs at the application development layer. It commonly occurs as part of the mid-life crisis phase of an application.

Initially all objects and representations are coded as needed and to fit the circumstances at hand. Of course the dynamics of the Big Ball of Mud anti-pattern are such that soon you will have many varying descriptions of the same concept and data. Before long you get the desire to clean up and rationalise all these repetitions, which is a good example of refactoring for simplicity. However, at this point danger looms.

Someone will point out eventually that having one clean data model works so well that perhaps there should be one shared data model that all applications will use. This is superficially appealing and is almost inevitably implemented with a lot of fighting and fussing to ensure that everyone is using the one true data model (incidentally I’m using data models here but it might be services or anything where several applications are meant to drive through a single component).

How happy are we then? We have created a consistent component that is used across all our applications in a great horizontal band. The people who proposed it get promoted and everyone is using the One True Way.

What we have actually done is recreated the n-tier application architecture. Hurrah! Now what is the problem with that? Why does no-one talk about n-tier application architecture anymore? Well the issue is Middleware and the One True Layer will inevitably hit the same rocks that Middleware did and get dashed to pieces.

The problem with the One True Layer is the fundamental fact that you cannot be all things to all men. From the moment it is introduced the OTL must either bloat and expand to cover all possible Use Cases or otherwise hideously hamstring development of the application. If there was a happy medium between the two then someone would have written a library to do the job by now.

There is no consistency between which of the two choices will be made; I have seen both and neither of them have happy outcomes. Either way from this point on the layer is doomed: it becomes unusable and before long the developers will be trying to work their way around the OTL as much as possible, using it only when threatened with dismissal.

If the codebase continues for long enough then usually what happens is the OTL sprouts a number of wrappers around its objects that allow the various consumers of its data to do what they need to. When eventually the initial creators of the OTL are unable to force the teams to use the layer then the wrappers tend to suck up all the functionality of the OTL and the library dependency is removed.

In some ways this may seem regressive, we are back at the anarchy of objects. In fact what has been created is a set of vertical slices that represent the data in the way that makes sense for the context they appear in. These slices then collaborate via external API interfaces that are usually presented via platform neutral data transfer standards (HTTP/JSON for example) rather than via binary compatibility.

My advice is to try to avoid binary dependent interactions between components and try to avoid creating very broad layers of software. Tiers are fine but keep them narrow and try to avoid any tier reaching across more than a few slices (this particularly applies to databases).


Get your own Couch

At Erlounge in London I recently had the chance to catch up with the guys J Chris Anderson and Jan Lehnardt. The conversation was interesting as ever and I was intrigued by JChris’s ideas that CouchDb has twisted conventional data storage logic on it’s head by making it easy to create many databases with relatively small amounts of information; the one database per user concept.

More importantly though I discovered it was okay to talk about the hosted Couch instances that (who also do CouchDb consulting if you need some help and advice) are offering. The free service offers you a Basic authentication account with 100Mb of storage to play around with Couch to your heart’s content. Paying brings more space but also more sophisticated authentication options.

The service is the perfect way to play around with Couch and learn how you could use it go get an account today! It’s on the Cloud as well: schema-less data on the Cloud, how buzzword compliant is that?!

On a more serious note, this is an excellent service and one I have been asking for as it allows people who have no desire to build and maintain Erlang and Couch to use the datastore as a web service rather than as a managed infrastructure component.

Programming, Software

Even if I’m happy with Subversion?

I’m not going to try and make the case for the next generation of DVCS (Git, Bazaar, Mercurial). There are a host of posts on the subject and you can search for them via the magic of Google searches. In addition you have thriving communities around BitBucket and GitHub and you have major repositories for huge open source projects like Linux and OpenJDK. All of this means that the tools are good, mature and support sophisticated development processes.

However I have been surprised that a number of people feel that Subversion is enough for their needs. Now this might be true for some people but I genuinely think that DVCS is a step change in source control and that ultimately it will adopted almost universally and Subversion will end up in the same place as CVS is now.

I have been trying to explain why I think this to people and I sometimes get the feeling that Subversion has ended up defining source control best practice by enshrining the limitations of the tools. For example a lot of people think that branches of the code should be short-lived. This is true for Subversion, branching and merging is painful. However in Mercurial and Git I fill that branching and to a lesser extent merging is pretty painless. In fact you often find yourself doing it without really thinking. Therefore why limit yourself to short-lived branches? Since it is easy to take change sets into branches it is easy to push canonical updates into derivative branches as you need to keep them up to date.

In truth some branches, like maintenance releases and conditional changes, tended to be long-lived in Subversion repositories anyway. They were just painful to manage. Also you had this set of conventions around branches, tags and trunk in the Subversion repository that really didn’t help manage the history and intent of the code they contained. In the DVCS model those repository concepts become repositories in their own right and are easier to manage in my view.

What about continuous integration? Many people have expressed an opinion that they don’t like to be more than five minutes away from the canonical source tree. However checking in to the parent source line this frequently means that intermediate work is frequently checked into “trunk” or its equivalent. I think that DVCS opens the possibility of having a very clean and consistent trunk because intermediate products and continuous integration can be pushed closer to the source of change. You still need to push to the canonical code stream frequently but I think in a DVCS world you actually pull a lot more than you push.

It is certainly my anecdotal experience that code “breaks” are rarer in DVCS as you tend to synchronise in your personal branches and resolve issues there so the view of the code as pushed to the canonical stream is much more consistent.

The recording of change in code files is also much more sophisticated in the changesets of DVCS’s. The ability to drill into change on all the leading contenders is amazing, as is the ability to track it between different repositories. Switching from the linearity of revisions to the non-linear compilation of changesets can be headfuck but once you’ve mastered it you begin to see the possibilities of constructing the same sets of changes in different ways to result in outcomes that would traditionally be called “branches”. Your worldview changes.

Subversion is not what I want to work with anymore and every time I have to pull an SVN archive across the world via HTTP/WebDAV or I have to wait for the server to be available before I can create a new copy of trunk or commit a set of changes then I wonder why people are still so happy with Subversion.

Groovy, Java, Programming, Scripting, Software

Working with Groovy Tests

For my new project Xapper I decided to try and write my tests in Groovy. Previously I had used Groovy scripts to generate data for Java tests but I was curious as to whether it would be easier to write the entire test in Groovy instead of Java.

Overall the experience was a qualified “yes”. When I was initially working with the tests it was possible to invoke them within Eclipse via the GUnit Runner. Trying again with the more recent 1.5.7 plugin, the runner now seems to be the JUnit4 one and it says that it sees no tests, to paraphrase a famous admiral. Without being able to use the runner I ended up running the entire suite via Gant, which was less than ideal, because there is a certain amount of spin-up time compared to using something like RSpec’s spec runner.

I would really like all the major IDEs to get smarter about mixing different languages in the same project. At the moment I think Eclipse is the closest to getting this to work. NetBeans and Intellij have good stories around this but it seems to me to be a real pain to get it working in practice. I want to be able to use Groovy and Java in the same project without having Groovy classes be one of the “final products”. NetBeans use of pre-canned Ant files to build projects is a real pain here.

Despite the pain of running them though I think writing the tests in Groovy is a fantastic idea. It really brought it home to me, how much ceremony there is in conventional Java unit test writing. I felt like my tests improved when I could forget about the type of a result and just assert things about the result.

Since I tend to do TDD it was great to have the test run without having to satisfy the compiler’s demand that methods and classes be there. Instead I was able to work in a Ruby style of backfilling code to satisfy the runtime errors. Now some may regard this a ridiculous technique but it really does allow you to provide a minimum of code to meet the requirement and it does give you the sense that you are making progress as one error after another is squashed.

So why use Groovy rather than JRuby and RSpec (the world’s most enjoyable specification framework)? Well the answer lies in the fact that Groovy is really meant to work with Java. Ruby is a great language and JRuby is a great implementation but Groovy does a better job of dealing with things like annotations and making the most of your existing test libraries.

You also don’t have the same issue of context-switching between different languages. Both Groovy and Scala are similar enough to Java that you can work with them and Java without losing your flow. In Ruby, even simple things like puts versus println can throw you off. Groovy was created to do exactly this kind of job.

If the IDE integration can be sorted out then I don’t see any reason why we should write tests in Java anymore.

Java, Programming, Software

Programming to Interfaces Anti-Pattern

Here’s a personal bete noir. Interfaces are good m’kay? They allow you to separate function and implementation, you can mock them, inject them. You use them to indicate roles and interactions between objects. That’s all smashing and super.

The problem comes when you don’t really have a role that you are describing, you have an implementation. A good example is a Persister type of class that saves data to a store. In production you want to save to a database while during test you save to an in-memory store.

So you have a Persister interface with the method store and you implement a DatabaseStorePersister class and a InMemoryPersister class both implementing Persister. You inject the appropriate Persister instance and you’re rolling.

Or are you? Because to my mind there’s an interface too many in this implementation. In the production code there is only one implementation of the interface, the data only ever gets stored to a DatabaseStorePersister. The InMemory version only appears in the test code and has no purpose other than to test the interaction between the rest of the code base and the Persister.

It would probably be more honest to create a single DatabaseStorePersister and then sub-class it to create the InMemory version by overriding store.

On the other hand if your data can be stored in both a graph database and a relational database then you can legitmately say there are two implementations that share the same role. At this point it is fair to create an interface. I would prefer therefore to refactor to interfaces rather than program to them. Once the interaction has been revealed, then create the interface to capture the discovery.

A typical expression of this anti-pattern in Java is a package that is full of interfaces that have the logical Domain Language name for the object and an accompanying single implementation for which there is no valid name and instead shares the name of the interface with “Impl” added on. So for example Pointless and PointlessImpl.

If something cannot be given a meaningful name then it is a sure sign that it isn’t carrying its weight in an application. Its purpose and meaning is unclear as are its concerns.

Creating interfaces purely because it is convenient to work with them (perhaps because your mock or injection framework can only work with interfaces) is a weak reason for indulging this anti-pattern. Generally if you reunite an interface with its single implementation you can see how to work with. Often if there is only a single implementation of something there is no need to inject it, you can define the instance directly in the class that makes use of it. In terms of mocking there are mock tools that mock concrete classes and there is an argument that mocking is not appropriate here and instead the concrete result of the call should be asserted and tested.

Do the right thing; kill the Impl.