Programming

Why Repositories and not Services?

This is a good question. Why do people like ThoughtWorks make a lot of fuss about things like Services but then want to use things like the Repository pattern when writing code?

The short answer is that Service Orientation and Domain Driven Design have two slightly different concerns.

For example, in a transportation domain you don’t get a Truck from a Truck Service you get a Truck from the Garage or the Truck Manufacturer depending on whether you own it or are buying it. The point being that a Truck Service in Domain Design terms is meaningless, it is just something that programmers introduce to make their code easier for them to use.

If however you want to track where a Consignment is then it makes sense to offer this as a service. For a start it has different audiences; I might want to offer tracking to customers via the web and a slightly more detailed version of the service to the Customer Service Department.

In this sense the Tracking Service is actually a Domain item, people actually talk about the Tracking Service and the Service has been organised around the transactions and expectations that happen in the course of transporting goods.

I am not sure if it makes any sense to talk about a Service in your codebase if it does not have an external consumer for its functionality. Usually Service objects that only interact with your own code can be broken up and have their concerns divided in a different way so as to eliminate them. A Truck Finder for example might make sense, it would handle finding out whether a Truck was on the Road, in the Depot or in the Repair Shop. The Depot might then tell you whether the Truck was being loaded while the Truck could tell you how full it currently was.

Once you have identified external consumers for a service then you get into the question of Service Contracts and a lot of the good things about Service Architectures begin to apply. Limiting concerns, platform neutrality and service composition for example; but this involves a lot more than just tacking the word “Service” on the end of a class.

Programming

Data Modelling with Representations

One of my colleagues at ThoughtWorks, Ian Robinson, is doing something really interesting with business and data modelling. His ideas are making me think again about some of my views on data modelling.

Data modelling is something that is criminally under-valued in software design. It is actual the crux of most IT problems and yet it often dealt with as an afterthought or from the perspective of making application development easier.

Unlike a lot of agile development tasks data modelling is often something that repays a lot of upfront thought and design. The point being that it is often easier to change a data model when it is still on the whiteboard or paper than when a schema has been constructed and is being managed. The cost of change is far greater the later it is made in the process than with most software development.

The key thing I have taken away from Ian’s work is that where the analysis goes too far is in specifying what data entities are composed of. This tips over into design and is where wasted and unnecessary work begins to appear. Instead if we can agree the language of the entities in our design we can actually have many different representations of the same entity in different parts of our system.

This may initially not seem that exciting but actually its quite subversive and challenges one of the ideas in Domain Driven Design that speaking in terms of data everything is symmetrical (i.e. the table has the same columns as the corresponding object has fields and so on…).

Saying that there is one conceptual element, say a Customer, but that there may be many Representations of a Customer is a really powerful tool particularly when trying to evolve already established systems. What this implies is that there is no single point of truth about what data a given Entity should hold and that instead we can construct representations that are suitable for the context we want to use.

The single conceptual framework allows us to introduce Ubiquitous Language but without the need to have symmetry. It also avoids making the database the single point of truth, which is the default position of most system designs, particularly where the database is also doing double-duty as an integration bus.

I think the language of Representations follows this pattern “the X as a Y”. So the Customer as a Participant in a Transaction can be different from the Customer as a Recipient of an Order.

If we think in set terms our different Representations are sets that intersect with our conceptual Customer. In most cases the Representations are going to be subsets, pairing down the information required to just that that is needed to fulfil the transaction the Representation participates in. In rarer cases we are going to have Representations that are supersets where maybe for a point the Representation carries information that will ultimately reside elsewhere once the transaction is complete but in my mind these are going to be quite rare.

So to summarise the advantages I think Representations will bring:

simplified data modelling diagrams, there is no need to record any information about what an entity contains, the only relevant information is its relation to other entities
a solution to Fat Objects that hoover up all possible functionality an entity can have over time
no need to produce a canonical version of an entity, the only relevant information is how one Representation gets mapped to another
a rational way to deal with existing data structures that cannot be changed in the current system
a framework for evolving data entities that operate in multiple roles

Programming, Ruby, Web Applications

Why does Heroku matter?

Heroku is an amazing cloud computing service that I think is incredibly important but which really isn’t getting the visibility it should be at the moment.

Why does Heroku matter?

Firstly it gets deployment right. What is deployment in Heroku? It is pushing a version of your app to the cloud. That’s it. The whole transformation into a deployable application slug is handled by the Heroku, pushing it out to your dynamos is handled by Heroku. Everything is handled by Heroku except writing your application.

Secondly, it is the first cloud/scalable computing targeted directly at Ruby that has the right pricing structure to encourage experimentation and innovation. It has the low barrier to entry that Google App Engine has but requires no special code and imposes no special restrictions.

Thirdly it does the right thing with your data. It respects your right to access, mine and manipulate your data and provides a mechanism for doing this easily.

For me Heroku has set a new benchmark for Cloud Computing and build and deployment in the enterprise as well.

Java, Programming

The DAO Anti-patterns

DAO is a venerable pattern and one that managed to escape out of J2EE and continues to be used a lot in Java development particularly where Spring is also in use. I have never been fan, either the first time around in the Spring and later incarnations. It’s worth having a read through the original blueprint. One thing I particularly love is the proposed combination of Factory and DAO, I find it representative of pattern thinking: if at first you don’t succeed, try excess. In fairness though probably no-one had the problem that the Factory/DAO set out to solve.

I feel the DAO actually creates more problems than it solves. Firstly there is the issue of what the DAO is actually encapsulating. Originally it was meant to encapsulate different data access methods, it would provide the same data retrieval irrespective of whether it was using JDBC, EJB, homebrew ORM or so on. However I do not think I have ever seen it implemented that way, in fact I don’t really recall ever seeing a DAO that had more than one implementation.

What DAO quickly came to be in practice was a replacement for EJBs. The most common implementation maps a single table to a DAO class that implements finder methods that return Collections of a Domain object (if you’re lucky and Lists if not). I think this accounts for 100% of Java DAOs I have encountered in my career. This is exactly what EJB definitions used to be like, arguably the only step forward is that you are now configuring in code. The step back is that you are now implementing all those finders by hand.

So a DAO abstracts something that is irrelevant (data access strategy), is linked directly to the underlying data model (usually the table) and provides an API that is just as useless.

The problem with those finders is that unless the DAO author thought of the query you want to perform then you are out of luck. Without the metaprogamming heavy lifting of a Ruby or Groovy then finders are a dead-end implementation strategy for querying data.

So for me DAOs are an anti-pattern that suck design energy out of the development team in the form of interminable discussions of what particular DAO a finder method belongs to and how many finders should be provided by the DAO. They provide zero decoupling from the underlying data and actually hold development teams back by introducing an unnecessary abstraction layer that needs to be understood but which adds no value.

So what are the alternatives? Well my preferred pattern is DataMapper, this doesn’t introduce unnecessary abstraction and shows a bit of respect for the underlying data. It allows you to do some vertical Domain modelling but the mapping gives you the flexibility to deal with legacy data schemes.

Another good alternative is to ditch the finders and introduce SearchCriteria and Repository. I thought this was a pattern too but it doesn’t seem to have a formal write up, the best example is the Hibernate Criteria but I would urge you to judiciously adapt it to your code rather than just straight up copying the the Hibernate model.

Java, Programming, Work

The Java Developer’s Dilemmia

I believe that Java developers are under a tremendous amount of pressure at the moment. However you may not feel it if you believe that Java is going to be around for a long time and you are happy to be the one maintaining the legacy apps in their twilight. Elliotte Rusty Harold has it right in the comments when someone says that there are a lot of Java jobs still being posted. If you enjoy feasting off the corpse then feel free to ignore the rest of this post because it is going to say nothing to you.

Java is in a tricky situation due to competition on all fronts. C# has managed to rally a lot of support. Some people talk nonsense about C# being what Java will look like in the future. C# is what Java would like if you could break backwards compatibility and indeed even runtime and development compatibility in some cases (with Service Packs). C# is getting mind share by leapfrogging ahead technology-wise at the expense of its early adoptors. Microsoft also does a far better job of selling to IDE dependent developers and risk-adverse managers.

Ruby and Python have also eaten Java’s lunch in the web space. When I am working on web project for fun I work with things like Sinatra, Django and Google App Engine. That’s because they are actually fun to work with and highly productive. You focus on your problem a lot sooner than you do in Java.

The scripting languages have also done a far better job of providing solutions to the small constant problems you face in programming. Automating tasks, building and deployment, prototyping. All these things are far easier to do in your favourite scripting language than they are in Java which will have to wait for JDK7 for a decent Filesystem abstraction for example.

Where does this leave Java? Well in the Enterprise server-side niche, where I first started to use it. Even there though issues of concurrency and performance are making people look to things like Erlang and JVM alternatives like Scala and Clojure.

While, like COBOL and Fortran there will always be a market for Java skills and development. The truth is that for Java developers who want to create new applications that lead in their field; a choice about what to do next is fast approaching. For myself I find my Java projects starting to contain more and more Groovy and I am very frustrated about the lack of support for mixed Java/Groovy projects in IDEs (although I know SpringSource is putting a lot of funding into the Eclipse effort to solve the problem).

If a client asks for an application using the now treadworn combination of Spring MVC and Hibernate I think there needs to be a good answer as to why they don’t want to use Grails which I think would increase productivity a lot without sacrificing the good things about the Java stack. Companies doing heavy lifting in Java ought to be investigating languages like Scala, particularly if they are arguing for the inclusion of properties and closures in the Java language spec.

Oracle’s purchase of Sun makes this an opportune moment to assess where Java might be going and whether you are going to be on the ride with it. It is hard to predict what Oracle will do, except that they will act in their perceived economic interest. The painful thing is that whatever you decide to do there is no clear answer at the moment and no bandwagon seems to be gaining discernible momentum. It is a tough time to be a Java developer.

Programming, Software

Even if I’m happy with Subversion?

I’m not going to try and make the case for the next generation of DVCS (Git, Bazaar, Mercurial). There are a host of posts on the subject and you can search for them via the magic of Google searches. In addition you have thriving communities around BitBucket and GitHub and you have major repositories for huge open source projects like Linux and OpenJDK. All of this means that the tools are good, mature and support sophisticated development processes.

However I have been surprised that a number of people feel that Subversion is enough for their needs. Now this might be true for some people but I genuinely think that DVCS is a step change in source control and that ultimately it will adopted almost universally and Subversion will end up in the same place as CVS is now.

I have been trying to explain why I think this to people and I sometimes get the feeling that Subversion has ended up defining source control best practice by enshrining the limitations of the tools. For example a lot of people think that branches of the code should be short-lived. This is true for Subversion, branching and merging is painful. However in Mercurial and Git I fill that branching and to a lesser extent merging is pretty painless. In fact you often find yourself doing it without really thinking. Therefore why limit yourself to short-lived branches? Since it is easy to take change sets into branches it is easy to push canonical updates into derivative branches as you need to keep them up to date.

In truth some branches, like maintenance releases and conditional changes, tended to be long-lived in Subversion repositories anyway. They were just painful to manage. Also you had this set of conventions around branches, tags and trunk in the Subversion repository that really didn’t help manage the history and intent of the code they contained. In the DVCS model those repository concepts become repositories in their own right and are easier to manage in my view.

What about continuous integration? Many people have expressed an opinion that they don’t like to be more than five minutes away from the canonical source tree. However checking in to the parent source line this frequently means that intermediate work is frequently checked into “trunk” or its equivalent. I think that DVCS opens the possibility of having a very clean and consistent trunk because intermediate products and continuous integration can be pushed closer to the source of change. You still need to push to the canonical code stream frequently but I think in a DVCS world you actually pull a lot more than you push.

It is certainly my anecdotal experience that code “breaks” are rarer in DVCS as you tend to synchronise in your personal branches and resolve issues there so the view of the code as pushed to the canonical stream is much more consistent.

The recording of change in code files is also much more sophisticated in the changesets of DVCS’s. The ability to drill into change on all the leading contenders is amazing, as is the ability to track it between different repositories. Switching from the linearity of revisions to the non-linear compilation of changesets can be headfuck but once you’ve mastered it you begin to see the possibilities of constructing the same sets of changes in different ways to result in outcomes that would traditionally be called “branches”. Your worldview changes.

Subversion is not what I want to work with anymore and every time I have to pull an SVN archive across the world via HTTP/WebDAV or I have to wait for the server to be available before I can create a new copy of trunk or commit a set of changes then I wonder why people are still so happy with Subversion.

Programming

Who hard-coded the VAT?

With the recent change in UK VAT rates there has been an intriguing chance to see which retailers have a decent data model and those who just assumed that VAT would never change.

If you look at your receipt then the companies that have a decent data model will probably just have a very ordinary looking receipt. Maybe if they are showing off then they will have a little message telling you the current rate.

However the companies that have a bad data model will have an additional line on your receipt that reads something like “VAT Discount” that subtracts 2% from your net bill. This means they have happily hardcoded the VAT rate throughout their system in the mistaken belief that it was a constant, like gravity.

In fact, so mild have the economic climes been recently that it seemed a reasonable assumption. I used to work on a retail system and while we did correctly data drive the VAT rates the data model had no way of tracking the change in VAT rate over time. If we changed the standard VAT rate it would also change it for all historical invoices prior to the switchover. For this reason when an invoice line was calculated the VAT also had to be calculated and written at the time order was finalised. It was obscure but the VAT rate was always the current rate and if you wanted to figure out the historical rate applied you had to do the work yourself.

Who would think that such a simple tax would cause such problems? Or allow you to see so much of the internals of big firm’s systems?

Groovy, Java, Programming, Scripting, Software

Working with Groovy Tests

For my new project Xapper I decided to try and write my tests in Groovy. Previously I had used Groovy scripts to generate data for Java tests but I was curious as to whether it would be easier to write the entire test in Groovy instead of Java.

Overall the experience was a qualified “yes”. When I was initially working with the tests it was possible to invoke them within Eclipse via the GUnit Runner. Trying again with the more recent 1.5.7 plugin, the runner now seems to be the JUnit4 one and it says that it sees no tests, to paraphrase a famous admiral. Without being able to use the runner I ended up running the entire suite via Gant, which was less than ideal, because there is a certain amount of spin-up time compared to using something like RSpec’s spec runner.

I would really like all the major IDEs to get smarter about mixing different languages in the same project. At the moment I think Eclipse is the closest to getting this to work. NetBeans and Intellij have good stories around this but it seems to me to be a real pain to get it working in practice. I want to be able to use Groovy and Java in the same project without having Groovy classes be one of the “final products”. NetBeans use of pre-canned Ant files to build projects is a real pain here.

Despite the pain of running them though I think writing the tests in Groovy is a fantastic idea. It really brought it home to me, how much ceremony there is in conventional Java unit test writing. I felt like my tests improved when I could forget about the type of a result and just assert things about the result.

Since I tend to do TDD it was great to have the test run without having to satisfy the compiler’s demand that methods and classes be there. Instead I was able to work in a Ruby style of backfilling code to satisfy the runtime errors. Now some may regard this a ridiculous technique but it really does allow you to provide a minimum of code to meet the requirement and it does give you the sense that you are making progress as one error after another is squashed.

So why use Groovy rather than JRuby and RSpec (the world’s most enjoyable specification framework)? Well the answer lies in the fact that Groovy is really meant to work with Java. Ruby is a great language and JRuby is a great implementation but Groovy does a better job of dealing with things like annotations and making the most of your existing test libraries.

You also don’t have the same issue of context-switching between different languages. Both Groovy and Scala are similar enough to Java that you can work with them and Java without losing your flow. In Ruby, even simple things like puts versus println can throw you off. Groovy was created to do exactly this kind of job.

If the IDE integration can be sorted out then I don’t see any reason why we should write tests in Java anymore.

Java, Programming

Three Little Mockers

At my last client I got the unusual chance to try three Java mocking frameworks within the same project. The project had started to use EasyMock as the project team felt that Mockito didn’t really have any decent documentation (it does but it’s in the Mockito codebase, not on the Google Code wiki). However as a ThoughtWorks team was working next door there was a strong push to use that.

My personal preference is still for JMock so I also selfishly pushed that into the project to round out the selection. With all three there; we were ready for a Mock Off!

The first distinctive thing is that EasyMock and Mockito can mock concrete classes not just interfaces. The second thing that all three have very different methods of constructing and verifying mocks. EasyMock is record-replay, JMock constructs expectations that are automatically verified by a JUnit runner when the test stops executing, Mockito uses a TestSpy where you can verify what happened to the mock whenever you want.

The record-replay paradigm lead to EasyMock being discarded early on. It has two kinds of problems as far as I was concerned. Firstly you have the feeling of “inside out” where the test is copying the internals of the class under test. Secondly I was confused several times as to what state my mock was in and having to switch the mocks to replay mode felt noisy, particularly where you were setting multiple collaborators in the test (do you switch them to replay once their recording is done or do you record all mocks then switch all of the them to replay?).

Mockito’s easy mocking of concrete classes made it a valuable addition to the toolkit and it does deliver the promised noise free setup and verificiation. However its use of global state was frustrating, particularly in that you can create a compiling use of the API that fails at runtime. If you call verify without a following method then you get an error about illegal use of the framework. Although this is meant to happen in the test that follows the test with the illegal construction, in practice this is hideous pain when the test suite is a non-trivial size. It also meant that a test was appearing to pass when actually nothing was being verified and the error appeared in another pair’s test (the one that implemented the next test) making them think that something was wrong with their code.

Mockito also uses a lot of static imports (which I do have a weaknesses for) but it also means that you have to remember the entry points into the framework.

JMock by comparision to Mockito feels quite verbose, the price for having all that IDE support and a discoverable API is that you have to have a Mockery or two in all your classes and you are defining all your Expectations for the Mockery. There’s no denying that even with the IDE autocomplete you’re doing a lot more typing with JMock. From a lazy programmer point of view you are going to go with Mockito every time.

And that is pretty much what happened in the project. Which I think is a shame. Now in explaining this I am going to go into a bit of ThoughtWorks testing geekery so if you are lazy feel free to go off and use Mockito because the pain I’m talking about will happen in the future not now.

I feel that Mockito is a framework for Test Driven Development and JMock is a framework for Test Driven Design. A lot of times you want the former: tight-deadline work and brownfield work. You want to verify the work you are doing but often design is getting pushed to the sidelines or you actually lack the ability to change the design as you might want. Mockito is great for that.

However Mockito doesn’t let the code speak to you. It takes care of so much of the detail that you often lose feedback on your code. One thing that making you type out your Expectations in JMock does is make you think, really think, about the way your code is collaborating. I think you are much more likely to design your code with JMock because it offers lots of opportunities to refactor. For example if two services are constantly being mocked together then maybe they contain services that are so closely related they should be unified, logically or physically. You don’t see that opportunity for change with Mockito.

By using both JMock and Mockito I was able to see that Mockito did tend to make us a bit lazy and our classes tended to grow fatter in terms of functionality and collaborators because there was no penalty for doing so. I was also concerned that sometimes Mockito’s ability to mock concrete classes meant that sometimes we mocked objects that should have been real or stub implementations.

Mockito is a powerful framework and I suspect that Java developers will take to it as soon as they encounter it. However for genuine Test Driven Design I think Mockito suffocates the code and requires developers with a lot more discipline and design-nous, almost certainly not the people that wanted an easy to use framework in the first place.

Java, Programming

Java Library Silver Bullet/Golden Hammer Syndrome

One thing I notice a lot with Java projects is that there is this strong desire to just have One Thing in the code base. One web framework, one testing framework, one mocking library, one logging library, one templating engine and so on and so on.

There is the understandable desire to reduce the complexity required to comprehend the codebase but that often flows over into One True Way-ism. There is One Library because it is The Library that we should all use to Fix Our Problems.

One reason why Java developers argue so fiercely and nervously at the outset of a project is that when they are defining the application stack there is the unspoken assumption that these choices of framework are fixed forever once made. Once we chose Spring then we will use Spring forever with no chance of introducing Guice or PicoContainer. If we make the wrong choice then we will Cripple Our Project Forever.

I actually like Slutty Architectures because they take away this anxiety. If we start out using JUnit 4 and we suddenly get to this point where TestNG has a feature that JUnit doesn’t and having that feature will really help us and save a lot of time; well, let’s use TestNG and JUnit4 and let’s use each where they are strongest.

Similarly if we are using Wicket and we suddenly want some REST-ful Web APIs should we warp our use of Wicket so we get those features? Probably not; let’s chose a framework like JAX-RS that is going to do a lot of the work for us in the feature we want to create.

Of course Slutty Architecture introduces problems too. Instead of the One True Way and hideous API abuse we do have an issue around communication. If we have two test frameworks then when someone joins the team then we are going to have to explain what we test with JUnit 4 and what we test with Test NG. There is a level of complexity but we are going to deal with it explicitly instead of giving a simple statement like “we use JUnit4” but which has been poisoned by all these unspoken caveats (“but we write all our tests like JUnit3 and you have to extend the right base test class or it won’t work”).

We also need to review our choices: when a new version of a library gets released. Does it include functionality that was previously missing? Perhaps in the light of the new release we should migrate the entire test code to TestNG for example. That kind of continual reassessment takes time but also stops an early technology choice becoming a millstone for everyone involved.

But please, let’s get over the idea that there has to be one thing that is both floor polish and desert topping. It’s making us ill.

Echo One

Sequentially arranged sentences composed of words (and punctuation)

Category Archives: Programming

Why Repositories and not Services?

Data Modelling with Representations

The DAO Anti-patterns

The Java Developer’s Dilemmia

Even if I’m happy with Subversion?

Who hard-coded the VAT?

Working with Groovy Tests

Three Little Mockers

Java Library Silver Bullet/Golden Hammer Syndrome