Programming

Data Modelling with Representations

One of my colleagues at ThoughtWorks, Ian Robinson,  is doing something really interesting with business and data modelling. His ideas are making me think again about some of my views on data modelling.

Data modelling is something that is criminally under-valued in software design. It is actual the crux of most IT problems and yet it often dealt with as an afterthought or from the perspective of making application development easier.

Unlike a lot of agile development tasks data modelling is often something that repays a lot of upfront thought and design. The point being that it is often easier to change a data model when it is still on the whiteboard or paper than when a schema has been constructed and is being managed. The cost of change is far greater the later it is made in the process than with most software development.

The key thing I have taken away from Ian’s work is that where the analysis goes too far is in specifying what data entities are composed of. This tips over into design and is where wasted and unnecessary work begins to appear. Instead if we can agree the language of the entities in our design we can actually have many different representations of the same entity in different parts of our system.

This may initially not seem that exciting but actually its quite subversive and challenges one of the ideas in Domain Driven Design that speaking in terms of data everything is symmetrical (i.e. the table has the same columns as the corresponding object has fields and so on…).

Saying that there is one conceptual element, say a Customer, but that there may be many Representations of a Customer is a really powerful tool particularly when trying to evolve already established systems. What this implies is that there is no single point of truth about what data a given Entity should hold and that instead we can construct representations that are suitable for the context we want to use.

The single conceptual framework allows us to introduce Ubiquitous Language but without the need to have symmetry. It also avoids making the database the single point of truth, which is the default position of most system designs, particularly where the database is also doing double-duty as an integration bus.

I think the language of Representations follows this pattern “the X as a Y”. So the Customer as a Participant in a Transaction can be different from the Customer as a Recipient of an Order.

If we think in set terms our different Representations are sets that intersect with our conceptual Customer. In most cases the Representations are going to be subsets, pairing down the information required to just that that is needed to fulfil the transaction the Representation participates in. In rarer cases we are going to have Representations that are supersets where maybe for a point the Representation carries information that will ultimately reside elsewhere once the transaction is complete but in my mind these are going to be quite rare.

So to summarise the advantages I think Representations will bring:

  • simplified data modelling diagrams, there is no need to record any information about what an entity contains, the only relevant information is its relation to other entities
  • a solution to Fat Objects that hoover up all possible functionality an entity can have over time
  • no need to produce a canonical version of an entity, the only relevant information is how one Representation gets mapped to another
  • a rational way to deal with existing data structures that cannot be changed in the current system
  • a framework for evolving data entities that operate in multiple roles
Standard
Programming, Ruby, Web Applications

Why does Heroku matter?

Heroku is an amazing cloud computing service that I think is incredibly important but which really isn’t getting the visibility it should be at the moment.

Why does Heroku matter?

Firstly it gets deployment right. What is deployment in Heroku? It is pushing a version of your app to the cloud. That’s it. The whole transformation into a deployable application slug is handled by the Heroku, pushing it out to your dynamos is handled by Heroku. Everything is handled by Heroku except writing your application.

Secondly, it is the first cloud/scalable computing targeted directly at Ruby that has the right pricing structure to encourage experimentation and innovation. It has the low barrier to entry that Google App Engine has but requires no special code and imposes no special restrictions.

Thirdly it does the right thing with your data. It respects your right to access, mine and manipulate your data and provides a mechanism for doing this easily.

For me Heroku has set a new benchmark for Cloud Computing and build and deployment in the enterprise as well.

Standard
Software

Get your own Couch

At Erlounge in London I recently had the chance to catch up with the Couch.io guys J Chris Anderson and Jan Lehnardt. The conversation was interesting as ever and I was intrigued by JChris’s ideas that CouchDb has twisted conventional data storage logic on it’s head by making it easy to create many databases with relatively small amounts of information; the one database per user concept.

More importantly though I discovered it was okay to talk about the hosted Couch instances that Couch.io (who also do CouchDb consulting if you need some help and advice) are offering. The free service offers you a Basic authentication account with 100Mb of storage to play around with Couch to your heart’s content. Paying brings more space but also more sophisticated authentication options.

The service is the perfect way to play around with Couch and learn how you could use it go get an account today! It’s on the Cloud as well: schema-less data on the Cloud, how buzzword compliant is that?!

On a more serious note, this is an excellent service and one I have been asking for as it allows people who have no desire to build and maintain Erlang and Couch to use the datastore as a web service rather than as a managed infrastructure component.

Standard
Java, Programming

The DAO Anti-patterns

DAO is a venerable pattern and one that managed to escape out of J2EE and continues to be used a lot in Java development particularly where Spring is also in use. I have never been fan, either the first time around in the Spring and later incarnations. It’s worth having a read through the original blueprint. One thing I particularly love is the proposed combination of Factory and DAO, I find it representative of pattern thinking: if at first you don’t succeed, try excess. In fairness though probably no-one had the problem that the Factory/DAO set out to solve.

I feel the DAO actually creates more problems than it solves. Firstly there is the issue of what the DAO is actually encapsulating. Originally it was meant to encapsulate different data access methods, it would provide the same data retrieval irrespective of whether it was using JDBC, EJB, homebrew ORM or so on. However I do not think I have ever seen it implemented that way, in fact I don’t really recall ever seeing a DAO that had more than one implementation.

What DAO quickly came to be in practice was a replacement for EJBs. The most common implementation maps a single table to a DAO class that implements finder methods that return Collections of a Domain object (if you’re lucky and Lists if not). I think this accounts for 100% of Java DAOs I have encountered in my career. This is exactly what EJB definitions used to be like, arguably the only step forward is that you are now configuring in code. The step back is that you are now implementing all those finders by hand.

So a DAO abstracts something that is irrelevant (data access strategy), is linked directly to the underlying data model (usually the table) and provides an API that is just as useless.

The problem with those finders is that unless the DAO author thought of the query you want to perform then you are out of luck. Without the metaprogamming heavy lifting of a Ruby or Groovy then finders are a dead-end implementation strategy for querying data.

So for me DAOs are an anti-pattern that suck design energy out of the development team in the form of interminable discussions of what particular DAO a finder method belongs to and how many finders should be provided by the DAO. They provide zero decoupling from the underlying data and actually hold development teams back by introducing an unnecessary abstraction layer that needs to be understood but which adds no value.

So what are the alternatives? Well my preferred pattern is DataMapper, this doesn’t introduce unnecessary abstraction and shows a bit of respect for the underlying data. It allows you to do some vertical Domain modelling but the mapping gives you the flexibility to deal with legacy data schemes.

Another good alternative is to ditch the finders and introduce SearchCriteria and Repository. I thought this was a pattern too but it doesn’t seem to have a formal write up, the best example is the Hibernate Criteria but I would urge you to judiciously adapt it to your code rather than just straight up copying the the Hibernate model.

Standard
Ruby, Scripting, Web Applications

Semi-static CMSs

You know you have a good idea when there are half a dozen implementations of it. All essentially the same but filled with subtle nuance. The latest idea I have stumbled across is the “semi-static” website generator. The idea is so painfully obvious I cannot believe that it has taken us until 2009 to do it properly.

What is a semi-static website? Well it’s actually something like this blog. I spend ages writing and editing the articles but once I publish something I don’t generally change it unless it has factual errors. I need something that allows me to make changes but my content easily but my content doesn’t really benefit in anyway from being stored in a database or being generated dynamically on the fly because it is going to spend most of its life not being changed.

If you look at a lot of CMS systems the same is true. People want to be able to edit and reorganise material easily but they are rarely changing it more than once a week. The majority of the time the pages are static.

If you look at a number of CMS or Wiki systems they are actually quite complicated. Before you can get going and generate your content (which remember is the core of what you want to do) you might have to setup a database and get the application deployed to a webserver.

What the semi-static website generator does is provide a set of scripts that allow you to edit and create content and then “compile” your site. The result is set of static HTML, Javascript and CSS files that can be deployed to your static webserver (which is generally pretty easy to setup, most servers will come with Apache all ready to go).

When you combine it with a bit of DVCS magic deployment is even easier. You just checkin your new pages and then update the webserver copy of the content.

So if you want to get into this kind of content management what options do you have? Well you have a lot of choice indeed. Almost all of them are Ruby projects and they have a certain similarity to the Rails concept of code generation. You will be issuing commands like “create page”, “build site” and so on. You operate at one remove from it.

There are many, many, many choices but for myself I chose three to look at Nanoc, Webby and Jekyll. Nanoc I came across via Proggit and was struck by the simplicity and clarity of the documentation.

Installation of all three is via gems but there are some differences here. Nanoc is a very simple Gem and works with HTML and Erb by default, if you want to use a markup language or different templating language you will have to install the gem yourself. This means it also installs easily under JRuby too.

Webby by comparison is a very noisy install, I think it may have its dependencies slightly wrong as it installs a version of rake, flexmock and rspec which are surely development dependencies rather than anything I as a user would need to use for the framework. It also uses hpricot which means you can’t install it under JRuby.

Jekyll has a similarly demanding set of requirements but will install under JRuby. Jekyll did feel a lot like a personal solution as I was installing it. Textile is great but if I am going to be using Markdown why do I need to install RedCloth?

Overall I liked the nanoc install experience most.

Moving on to using them. Nanoc and Webby are very similar in that they use generators to create new elements in the site. Jekyll seems to use convention instead. At this point it is worth mentioning that there are some big differences between the three products in their intentions. Nanoc is a general website creator but it does have a certain amount of opinion as to how to do that. For example it generates very nice clean website urls by putting each page in a directory by itself and then creating the page as an index.html page within that directory. Make sure you have your webserver configured appropriately.

Webby is pretty flexible but comes with some built-in templates that will be build you a basic page, a tumblog style site, a normal blog and a presentation. That was a pretty neat feature and one the other frameworks didn’t have.

However the negative side to this is that the templating is based on Blueprint; which is both excellent as Blueprint is a great way of designing pages and a pain as Blueprint is actually pretty involved in terms of understanding the grid concept and what the various span-x css classes translate to. I felt that Webby did a great job of offering cross-browser styling that looked good but that in doing so it didn’t make the simple stuff simple. With my trial site my sidebar came out on the bottom of the page and I didn’t really know how to get on the right-hand side of the page. In nanoc I could just float: right it, in Webby I felt I had to understand the grid that had been created for me. I’m not sure it really allowed you to use any CSS knowledge someone might already have but the benefit is you know your site is going to look good across browsers and in print and so on. For more professional site development this might actually sway you to select Webby.

Jekyll is primarily a tool for publishing a blog style site so it has a lot of built-in support for special information relating to posts but in terms of layouts it uses general HTML templating like Nanoc does. It allows  for the creation of little chunks of reusable content across pages via includes but lacks some of the flexibility of Nanoc’s tag helpers. While focussing on blog publishing Jekyll actually encompasses a powerful set of processing tools for content that effectively allows you to use it as CMS. Jekyll is used to process GitPages on GitHub, a real demonstration of its power beyond just blogging.

For what I wanted to do I initially felt that Nanoc was the best choice, it’s flexible, powerful and generates nice sites. Webby is powerful but didn’t expose that power to me in a way that made me feel in-control. However when I started to get to grips with Jekyll I was really amazed at how powerful the use of a few conventions were. Jekyll processes your site’s pages effectively in-place. It can choose to process your files if they match a naming convention (such as having a content extension like textile or markdown) and it uses an embedded YAML section in the file to do things like provide the title of the page or select the layout that the content should be displayed within. While Nanoc produces fancy URLs Jekyll made it easy to build more conventional websites consisting of clusters of pages.

Jekyll uses a small number of special directories (indicated by a leading underscore) to hold things like layouts, cross-page snippets and posts (if you are using the dated posting functionality). However for a simple site the number of directories and configuration files was significantly less than Nanoc. In fact you could get by with just a master page layout, which would require just one special directory and page within it. Everything that should have been simple was with Jekyll and the result was very close to a conventional website you might have coded by hand but extremely fast to create. It also uses just one command to press the pages and fire up a local server to view the pages. Both Webby and Nanoc seemed to make pressing and previewing slightly harder than it needed to be and at least once I forget to execute the Nanoc commands in the right order.

So, conclusions… Going forward I will be using both Nanoc and Jekyll. Webby I would be willing to consider but I don’t really have a task for it that would reward the additional effort I have to invest in its capabilities. If I just want to bang out pages for a website but minimise my effort and make it easy to abstract common content then I will definitely be using Jekyll. It is a joy to use. For sites that are maybe more based around articles or grouped content then I think I will be using Nanoc. The default URL construction is nice for hierarchical data (as long as you can stand the many directories) and there’s a sense that the configuration files make it easy to make changes that affect the site as a whole. However it does use a lot of scaffolding to achieve the effect and there are a lot of exposed options that could have been reduced by adopting some conventions Jekyll style.

Standard
Java, Programming, Work

The Java Developer’s Dilemmia

I believe that Java developers are under a tremendous amount of pressure at the moment. However you may not feel it if you believe that Java is going to be around for a long time and you are happy to be the one maintaining the legacy apps in their twilight. Elliotte Rusty Harold has it right in the comments when someone says that there are a lot of Java jobs still being posted. If you enjoy feasting off the corpse then feel free to ignore the rest of this post because it is going to say nothing to you.

Java is in a tricky situation due to competition on all fronts. C# has managed to rally a lot of support. Some people talk nonsense about C# being what Java will look like in the future. C# is what Java would like if you could break backwards compatibility and indeed even runtime and development compatibility in some cases (with Service Packs). C# is getting mind share by leapfrogging ahead technology-wise at the expense of its early adoptors. Microsoft also does a far better job of selling to IDE dependent developers and risk-adverse managers.

Ruby and Python have also eaten Java’s lunch in the web space. When I am working on web project for fun I work with things like Sinatra, Django and Google App Engine. That’s because they are actually fun to work with and highly productive. You focus on your problem a lot sooner than you do in Java.

The scripting languages have also done a far better job of providing solutions to the small constant problems you face in programming. Automating tasks, building and deployment, prototyping. All these things are far easier to do in your favourite scripting language than they are in Java which will have to wait for JDK7 for a decent Filesystem abstraction for example.

Where does this leave Java? Well in the Enterprise server-side niche, where I first started to use it. Even there though issues of concurrency and performance are making people look to things like Erlang and JVM alternatives like Scala and Clojure.

While, like COBOL and Fortran there will always be a market for Java skills and development. The truth is that for Java developers who want to create new applications that lead in their field; a choice about what to do next is fast approaching. For myself I find my Java projects starting to contain more and more Groovy and I am very frustrated about the lack of support for mixed Java/Groovy projects in IDEs (although I know SpringSource is putting a lot of funding into the Eclipse effort to solve the problem).

If a client asks for an application using the now treadworn combination of Spring MVC and Hibernate I think there needs to be a good answer as to why they don’t want to use Grails which I think would increase productivity a lot without sacrificing the good things about the Java stack. Companies doing heavy lifting in Java ought to be investigating languages like Scala, particularly if they are arguing for the inclusion of properties and closures in the Java language spec.

Oracle’s purchase of Sun makes this an opportune moment to assess where Java might be going and whether you are going to be on the ride with it. It is hard to predict what Oracle will do, except that they will act in their perceived economic interest. The painful thing is that whatever you decide to do there is no clear answer at the moment and no bandwagon seems to be gaining discernible momentum. It is a tough time to be a Java developer.

Standard
Programming, Software

Even if I’m happy with Subversion?

I’m not going to try and make the case for the next generation of DVCS (Git, Bazaar, Mercurial). There are a host of posts on the subject and you can search for them via the magic of Google searches. In addition you have thriving communities around BitBucket and GitHub and you have major repositories for huge open source projects like Linux and OpenJDK. All of this means that the tools are good, mature and support sophisticated development processes.

However I have been surprised that a number of people feel that Subversion is enough for their needs. Now this might be true for some people but I genuinely think that DVCS is a step change in source control and that ultimately it will adopted almost universally and Subversion will end up in the same place as CVS is now.

I have been trying to explain why I think this to people and I sometimes get the feeling that Subversion has ended up defining source control best practice by enshrining the limitations of the tools. For example a lot of people think that branches of the code should be short-lived. This is true for Subversion, branching and merging is painful. However in Mercurial and Git I fill that branching and to a lesser extent merging is pretty painless. In fact you often find yourself doing it without really thinking. Therefore why limit yourself to short-lived branches? Since it is easy to take change sets into branches it is easy to push canonical updates into derivative branches as you need to keep them up to date.

In truth some branches, like maintenance releases and conditional changes, tended to be long-lived in Subversion repositories anyway. They were just painful to manage. Also you had this set of conventions around branches, tags and trunk in the Subversion repository that really didn’t help manage the history and intent of the code they contained. In the DVCS model those repository concepts become repositories in their own right and are easier to manage in my view.

What about continuous integration? Many people have expressed an opinion that they don’t like to be more than five minutes away from the canonical source tree. However checking in to the parent source line this frequently means that intermediate work is frequently checked into “trunk” or its equivalent. I think that DVCS opens the possibility of having a very clean and consistent trunk because intermediate products and continuous integration can be pushed closer to the source of change. You still need to push to the canonical code stream frequently but I think in a DVCS world you actually pull a lot more than you push.

It is certainly my anecdotal experience that code “breaks” are rarer in DVCS as you tend to synchronise in your personal branches and resolve issues there so the view of the code as pushed to the canonical stream is much more consistent.

The recording of change in code files is also much more sophisticated in the changesets of DVCS’s. The ability to drill into change on all the leading contenders is amazing, as is the ability to track it between different repositories. Switching from the linearity of revisions to the non-linear compilation of changesets can be headfuck but once you’ve mastered it you begin to see the possibilities of constructing the same sets of changes in different ways to result in outcomes that would traditionally be called “branches”. Your worldview changes.

Subversion is not what I want to work with anymore and every time I have to pull an SVN archive across the world via HTTP/WebDAV or I have to wait for the server to be available before I can create a new copy of trunk or commit a set of changes then I wonder why people are still so happy with Subversion.

Standard