Blogging

How many microblogging sites can there be?

Last time I was on Identi.ca I noticed that most of the messages were being posted from Ping.fm. This means that people are effectively are broadcasting there but who is listening? Possibly no-one.

Tomorrow the Today Programme on Radio 4 is going to ask whether Twitter is replacing blogging. I know that because they Tweeted about it.

Twitter might not be the best service or the first but it certainly seems to have hit some critical mass where it is now crossing over into the mainstream and before long it seems likely that it will be synonymous with microblogging in the way that Flickr and online photos are.

I’m currently following Stephen Fry’s wildlife documentary making on Twitter and even John Cleese is on there. When you have that kind of penetration I think most of your rivals can run up the white flag and retreat to the niche areas where they excel.

Software

Refactoring RDBMS: Adding a new column

So, the requirements change and you now need to record a new piece of information in one of your tables. That’s a pretty easy one, right? Just add a column and you’re laughing.

Well this is probably the way I see this problem solved 100% of the time (Rails migrations and Grails GORM for example) and I actually think it is the worst solution.

When you add a column to a table you should always consider whether you can add a sensible value. Adding a nullable column to a table is a really bad idea. Firstly, it isn’t relational (and it is almost guaranteed to reduce your normal form). Secondly, it is likely to introduce performance problems if the column needs to be searchable. Finally, nullable columns have a higher cost of maintenance as they imply logic (the null state) but don’t define it. The logic as to when the column may be null will always exist outside the schema.

If you can define a sensible default then you might be able to add the column to the table. For example boolean flag fields could be handled this way as long as the historic values can be accurately mapped to, say, false.

This method however falls down if your historical data cannot be truthfully mapped to a default value. If for example the new value represents something you have started to track rather than some brand new piece of information then you cannot truthfully map the historic data to the default value as you don’t know whether the default really applies or not.

For example if you send goods to customers using a regular or overnight delivery method and you need to start tracking the delivery method it would be wrong to map all historic data to the regular delivery. The truth is you don’t know whether a historical order was an overnight delivery or not. If you do map it incorrectly then you will end up skewing all your data and ultimately poison the very data you want to be tracking.

In this case it is far better to simply introduce a relational child table to hold the new information. Using child tables is a more accurate record of the data as a child row will only exist where the data is known. For unknown historic records there will be no child entry and your queries will not need any special cases.

When using child table data like this you can easily separate your data via the EXISTS predicate and in most databases EXISTS is very performant.

I think using child tables to record new information is relational way to solve the problem of adding new columns but it is usually turned down for one of two reasons (there naturally may be others but I’ve only ever heard these used in anger).

Firstly there is the argument that this technique leads to a proliferation of child tables. This is an accurate criticism but unfortunately if you want your database to be accurate and the new information is introduced piecemeal then you do not have a lot of choice. Pretending that you have historic data you don’t actually have doesn’t solve the problem.

One thing that can help this situation is to separate your database into its transaction processing element and its warehousing or archive aspect. In the warehouse the structure of the database might be quite involved but since the accuracy of the data is more important than the simplicity of the data model and the queries tend to be of a reporting or aggregating nature there tends to be less issues with having many tables. Often views can allow you to present a high-level view of the data and introducing null values into the heading of query, while ugly, is more acceptable than breaking normal form in the raw data itself (though god help you if you start joining aggregate views on the gappy data rather than the underlying true state of the data).

The transactional version of the database is then free to reflect the state of the current data model alone rather than the historical permutations. Here the model can be clean and reflect more of the Domain Model although again you want to retain the Relational aspect of the data.

Having separate datastores with different schemas often makes app developers lives easier as well. Usually an application is only concerned with current interactions and not historical transactions. As long as historical data can be obtained as a Service then you have no need to reflect historical schema concerns in your application.

However even if you cannot separate the data there is still little reason to introduce null value columns into the database. ORM often makes the simple assumption that one data entity is one object, but that is simply to make ORM easier. The application persistence layer shouldn’t be allowed to distort the underlying relational data model. If your data isn’t relational or you can’t define a fixed schema, don’t use an RDBMS. Look into things like CouchDB.

In fact some persistence schemes, like JPA, actually allow you to reflect the reality of the relational data by allowing embedded entities in objects. This is a leaky abstraction because from an object point of view there often is very little reason why, say, an Order object contains a Delivery Dispatch Type object.

If you want to avoid the leak then use the Repository pattern to create a true Domain object and be prepared to handle the complexity of transforming the relational data to the object-orientated data within the Repository. Remember that the Repository represents a true boundary in the system: you are hoping to derive real benefits from both the relational representation and the object representation, not screw one in favour of the other.

So please; don’t just wack in another nullable column and then wonder why queries on your database are slow.

Web Applications

Rounded Corners: Die! Die! Die!

If you’ve noticed that we don’t seem to have a lot of websites designed in screaming primary colours. We also don’t have a lot pastel coloured websites anymore (I liked them though). That is because every three years the design schools kick out a new wave of designers who spend five years regurgitating the received wisdom they have just learned.

What is the most insidious recent web trend? Rounded corners. What started as a kind of joke and an impressive hack to differentiate websites is now the stifling, boring norm.

Rounded corners are boring. If you add them to your website they make it look just like all the other websites in the world. If you create an image hack to implement rounded corners so they work on IE then you are actually mad. You are actually forcing users to download at least two images for the privilege of looking just the same as everyone else.

Currently the most exciting web design is Twistori which is an exercise in retro colours but take a close look at it.

Where are the underlined words to show me I can click on things? How can it create an entirely different ways to read text using font size alone? What is it doing?! I think, I think… it might be, the future.

I also quite like the look of GitHub, look at those panels, look at that negative space. Even the tab bar is square.

Tab bars are at the end of their lives too, there are too many and they look too alike. What you want to do is take that tab bar and turn it into massive words that are clickable. That would be awesome.

At least for a few years.

Software, Web Applications, Work

String Templates, or what I learned from Python and doing nothing

It’s an ill wind that blows no-one any good. The same is true of projects (although money generally helps more here; it’s an ill project that is making no-one any money).

I’m currently meant to be doing some work on Accessibilty for some new HTML pages. I thought it would be pretty easy but I was really wrong and it is changing the whole way I look at the View part of the (deceased) MVC web paradigm.

On my last project I was looking at things like Groovy’s Markup Builder and marvelling at how my collegues managed to put together a 30 line Freemarker template that did some pretty compex HTML assembly. In my spare time I have been looking at Haml as a way of escaping the verbosity and monotony of XHTML and to have the code guarantee the correctness of my page structure to avoid validation grind.

That’s because in those projects I was a developer/web designer. I wanted accurate, compliant HTML with minimum effort and which was easy to style without having awful CSS hacks.

On my current project I’m in the utterly baffling (for me anyway) world of .Net. There is no way that I can understand the huge variety of C#, XML and templating overrides that make up my current project. Having code generate HTML is a massive barrier to me being productive because, while I know a far bit of HTML having to root around an entire Visual Studio project to find the fragment that generates the problematic Div element you actually want to work with means I spend the whole day knowing nothing about .Net rather than applying the knowledge I do have.

Now some people are going to say that having a wacky Component model is different from having a nice templating language but look at something like Haml or Freemarker. The former is concise and fun and full of obscure rules; the latter is tremendously powerful, more firmly rooted in HTML and not much less obscure. For power users I agree, they are the bomb. They are a massive barrier to entry though, in a way that HTML just isn’t. People may do HTML badly but they rarely don’t do it at all.

This experience put combined with using Django/Jinja/Google App Engine is leading to me have a huge rethink about the way templating and views are put together. Passing a map of parameters to a template that is essentially exactly the way the output will look on the final device is obviously the way this problem should be tackled.

To try and get the HTML to generate in the current project I spent a day trying to get: SQLServer, BizTalk, Active Directory and Windows MQ to work together. This is utter madness and can only have been created by programmers who have no idea how to collaborate with non-programmers.

Why should I be trying to install BizTalk when what I want to do is actually generate some sample HTML so we can have a quick check of WAI standards? It should be possible to define some fixture data and then just generate HTML from the templates. It really shouldn’t be hard.

This experience is really changing the way I think about web frameworks. I am already determined to learn String Template, then I am going to look at whether my current favourite frameworks allow me to use it. I’m going to look at frameworks that ask you to put the HTML template next to the Java code. I want to know if I can put those templates in the same heirarchy that the actual website uses.

In short if I need to work with people outside the project team on a web project again, how can I get all the good things about templates and combine them with both simplicity and intuition as to how a website is organised?

Web Applications

Dropbox, how did you ever live without it?

If Distributed Version Control and their related source sites improved my coding life massively this year then Dropbox was a little piece of everything magic that dropped into my life.

Dropbox is a distributed file share system, for someone with multiple machines running multiple OS that’s pretty handy. Previously I had been using Jets3t and Amazon S3 to try and keep files in the cloud. That’s a great system but it has a major problem that Dropbox fixes effortlessly.

Data objects in your S3 buckets have no revision history, there’s only one copy of the object and when you upload the wrong revision of the file you clobber the correct version and then if you don’t notice you can quite happily overwrite later revisions of the file on your other machines.

Dropbox versions your files and allows you to revisit previous revisions of your data. It’s like a file share and source control in one awesome package. Dropbox also handles the synchronisation for you seemlessly. As soon as a file has changed and is accessible it is whisked away to the net. Your other machines synchronise on startup or when they reconnect to the web. The file copy is very fast for me but I tend to have lots of bitty files rather than monster sets of images I am constantly working on.

It also has some features that I haven’t tried yet like creating public URLs to data for general download.

Using Dropbox has radically simplified my life, my encrypted password files now synch between home and work machines without my having to do anything. I can work on documents as I get inspired without having to wonder whether I uploaded the latest copy to S3. All my worries about clobbering files are over and even if I can’t run the nifty client application I still get to access my Dropbox via the web interface.

Dropbox, for me has been an unqualified success.

Groovy, Programming, Software

Using Gant to build Java Projects

I know I said I would be taking a step by step introduction to Gant in my last post on the subject but sometimes the devil drives and you need things done in a less systematic way.

Recently I have been building Java and Scala projects with Gant. I think it has been a successful exercise so I am just going to jump on and show you some example buildfiles. The first one is going to be a Java project. This project is obviously toy code but I think if you just download the sample project and start filling in your own code (it’s an Eclipse project, Intellij and NetBeans should both import it successfully) you will be happy with how little you have to change the build file.

First have a look at the Gant file itself and then I am going to talk about the things that I think make Gant such a powerful and productive tool. The first thing to point out is the line count, this represents a complete build file for a Java SE project in under 100 lines. That includes the Hello World target I’ve left in as an echo test. Gant might have a slightly weird syntax if you are unfamiliar with Groovy but it isn’t verbose.

Targets and dependencies were in the last post so this time the new thing is using AntBuilder. Any method call that starts with Ant is a invocation of an Ant task. These are nothing but thin wrappers around the normal Ant task (and in fact I usually write them using the Ant Task documentation). Things that are attributes in Ant XML become hash properties in the parameter list. Things that would normally be nested elements are calls to the enclosing builder.

One area where Gant wins big is the way you can mix normal variables, Groovy string interpolation and Ant properties. Declaring the directory paths as Groovy variables near the head of the file allows me to create new paths via interpolation and assign the variable to the properties of an Ant task. Ant XML has properties and macro interpolation but this is both clearer and easier.

I am also using the Gant built-in clean task and in the course of using it, Groovy’s operator overloading for lists. I’m not a huge fan of operator overloading but if it’s clear enough here then it is great to have a DRY list assignment.

I also like the way that the directories are created in the init task. This kind of closure looping over a list again shows some of the power and conciseness that can be achieved by a language rather than a configuration file. If you don’t read Groovy then the each method iterates over the list its attached to and each item resulting from the iteration is stored in the whimsical “it” local variable.

Programming, Ruby, Scripting, Web Applications

Sinatra example

Since the Sinatra Project website currently seems to have been hijacked and directed to spam (the RubyForge page seems to be fine and is still a good introduction in its own right) I wanted to post an example of how to get going with Sinatra and also to highlight what makes it such a different approach to web frameworks. Here’s the example code on CodeDumper.

I wrote this with JRuby (for the cross-platform win). You’ll need to install the Sinatra, Haml and Mongrel Gems to get it to run.

It uses two styles, first there is the REST-ful extraction of parameters from the url and then there is the form POST submission. In both cases the code is pretty much the same as Sinatra extracts all parameters into the params hash.

I’ve inlined the Haml to make the example simple but normally the view templates would be extracted out of the code into separate files.

Sinatra is sometimes described as being as a web DSL rather than a web framework, it seems apt as it eschews MVC separation but instead by attaching code to routes directly it allows Controller code to be tiny and to delegate appropriately rather than putting in a heavyweight structure that might result in more frameworking code than actual “doing” code.

However one of the things I really like is something you can’t show in a code fragment and that is the ability to dynamically interact with your web application on the fly. Start up Sinatra, change the code of the file you’re running and your changes are reflected immediately. End the compile, deploy, view cycle! It kind of makes web programming fun again.

Java, Software

Announcing: Xapper

Okay so here’s my first proper (i.e. finished) open source project. It is an implementation of an XHTML Builder on top of the excellent XOM library.

It is called Xapper (XHTML Wrapper) and it’s quite lightweight and (I think) easy to use.

Computer Games, Games

Gamer Intelligence Fail

A popup telling you that the Last Name textfield should contain your last name

O Rly?

Warhammer: Age of Reckoning. There’s a lot more where this came from… EA are truly teh suck as are Mythic the developers.

A Flash application to register that gives you the most insane popups and then fails to complete the registration process because it cannot connect to its database. What do you think the error message is?

OMG, you’re right! “Please check your internet connection”.

If I’m not connected to the internet then how am I filling out your form?!

Software

Google Chrome: what browses what?

Okay, so nearly a month after it lauched how is Google Chrome changing the way we browse? Well for Linux and OSX users, not very much. However on Windows, Chrome is finding a place into my day to day browsing. Firstly I have started to tend to use it with Google products. There’s nothing rational about this, it seems to be just brand fetishism.

I have also started to use it for any site that uses Gears. Since Gears is built in to the browser it just seems to make sense. I liked Gears a lot before Chrome and although I have it installed in Firefox I figure it is easier to use the features when they are integrated into the browser and have the advantage of the V8 Javascript engine.

I also use Google Chrome on sites where I actually expect a lot of Flash, script and Fail. Being able to kill poorly programmed sites while keeping on trucking with the browser is a pretty killer feature.

Finally I also use it to view links where I want to look at something briefly and then do nothing more with it. I don’t know whether it really makes a difference but I always wonder how much stuff Firefox caches when I am briefly checking a link for something a blog post.

Echo One

Sequentially arranged sentences composed of words (and punctuation)