One of my colleagues at ThoughtWorks, Ian Robinson, is doing something really interesting with business and data modelling. His ideas are making me think again about some of my views on data modelling.
Data modelling is something that is criminally under-valued in software design. It is actual the crux of most IT problems and yet it often dealt with as an afterthought or from the perspective of making application development easier.
Unlike a lot of agile development tasks data modelling is often something that repays a lot of upfront thought and design. The point being that it is often easier to change a data model when it is still on the whiteboard or paper than when a schema has been constructed and is being managed. The cost of change is far greater the later it is made in the process than with most software development.
The key thing I have taken away from Ian’s work is that where the analysis goes too far is in specifying what data entities are composed of. This tips over into design and is where wasted and unnecessary work begins to appear. Instead if we can agree the language of the entities in our design we can actually have many different representations of the same entity in different parts of our system.
This may initially not seem that exciting but actually its quite subversive and challenges one of the ideas in Domain Driven Design that speaking in terms of data everything is symmetrical (i.e. the table has the same columns as the corresponding object has fields and so on…).
Saying that there is one conceptual element, say a Customer, but that there may be many Representations of a Customer is a really powerful tool particularly when trying to evolve already established systems. What this implies is that there is no single point of truth about what data a given Entity should hold and that instead we can construct representations that are suitable for the context we want to use.
The single conceptual framework allows us to introduce Ubiquitous Language but without the need to have symmetry. It also avoids making the database the single point of truth, which is the default position of most system designs, particularly where the database is also doing double-duty as an integration bus.
I think the language of Representations follows this pattern “the X as a Y”. So the Customer as a Participant in a Transaction can be different from the Customer as a Recipient of an Order.
If we think in set terms our different Representations are sets that intersect with our conceptual Customer. In most cases the Representations are going to be subsets, pairing down the information required to just that that is needed to fulfil the transaction the Representation participates in. In rarer cases we are going to have Representations that are supersets where maybe for a point the Representation carries information that will ultimately reside elsewhere once the transaction is complete but in my mind these are going to be quite rare.
So to summarise the advantages I think Representations will bring:
- simplified data modelling diagrams, there is no need to record any information about what an entity contains, the only relevant information is its relation to other entities
- a solution to Fat Objects that hoover up all possible functionality an entity can have over time
- no need to produce a canonical version of an entity, the only relevant information is how one Representation gets mapped to another
- a rational way to deal with existing data structures that cannot be changed in the current system
- a framework for evolving data entities that operate in multiple roles