Software, Work

Don’t hate the RDBMS; hate the implementation

I read through this post with the usual sinking feeling of despair. How can people get so muddled in their thinking? I am not sure I can even bear to go through the arguments again.

More programmers treat the database as a dumb store than there are situations where such treatment is appropriate. No-one is saying that Twitter data is deep, relational and worthy of data mining. However not all data is like Twitter micro blog posts.

The comments on the post were for the most part very good and say a lot of what I would have said. However looking at CouchDB documentation I noticed that the authors make far less dramatic claims for their product that the blog post does. A buggy alpha release of a hashtable datastore is not going to bring the enterprise RDBMS to its knees.

I actually set up and ran Couch DB but I will save my thoughts for another day, it’s an interesting application. What I actually want to talk about is how we can get more sophisticated with our datastores. It is becoming apparent to me that ORM technologies are really making a dreadful hash of data. The object representation is getting shafted because inheritance is never properly translated into a relational schema. The relational data is getting screwed by the fact the rules for object attributes is at odds with the normal forms.

The outcome is that you end up with the bastard hybrid worst of all worlds solution. What’s the answer?

Well the first thing is to admit that application programmers think the database is a big dumb datastore and will never stop thinking that. The second is that relational data is not the one true way to represent all data. They are the best tool we have at the moment for representing rich data sets that contain a lot of relational aspects. Customer orders in a supply system is the classic example. From a data mining point of view you are going to be dead on your feet if you do not have customers and their orders in a relational data store. You cannot operate if you cannot say who is buying how much of what.

If you let developers reimplement a data mining solution in their application for anything other than your very edge and niche interests then you are going to be wasting a lot of time and money for no good reason. You simply want a relational datastore, a metadata overlay to reinterpret the normalised data in terms of domain models and a standard piece of charting and reporting software.

However the application programmers have a point. The system that takes the order should not really have to decompose an order taken at a store level into its component parts. What the front end needs to do is take and confirm the order as quickly as possible. From this point of view the database is just a dumb datastore. Or rather what we need is a simple datastore that can do what is needed now and defer and delegate the processing into a richer data set in the future. From this point of view the application may store the data in something as transient as a message queue (although realistically we are talking about something like an object cache so the customer can view and adjust their order).

Having data distributed in different forms across different systems creates something of headache as it is hard to get an overall picture of what is happening on the system at any given moment. However creating a single datastore (implemented by an enterprise RDBMS) as a single point of reference is something of an anti-pattern. It is making one thing easier, the big picture. However to provide this data is being bashed by layering technologies into all kinds inappropriate shapes and various groups within the IT department are frequently in bitter conflict.

There needs to be a step back and IT people need to accept the complexity and start talking about the whole system comprising of many components. All of which need to be synced and queried if you want the total information picture. Instead of wasting effort in fitting however many square pegs into round holes we need to be thinking about how we use the best persistence solution for a given solution and how we report and coordinate these many systems.

It is the only way we can move forward.

Standard