Work

What next for Carbon Co-op?

A few people have asked me about what is happening next with Carbon Co-op. The short answer is that I have no idea. I had never heard of it before Saturday and I am not very personally invested in it. Probably the best thing to do if you are interested in the future of the business idea is follow the founder on Twitter and ask him.

Now as far as the code that is posted on Launchpad is concerned. I’m going to leave the Sicamp branch exactly as it was when I pushed at 2pm on Sunday. That’s where we go to.

I don’t really like the thought of it but I am tempted to do a screencast to explain the design of the app and show the site and basically all the stuff that wasn’t included in the actually SiCamp presentation. We did a lot of work! We should have shown it!

In terms of the trunk branch I have a few technical questions left of Django that I might use the project to experiment on. These are:

  • Markdown support in descriptions
  • AJAX support
  • Using MySql in the deployed mod_python version (with that done I could open up the app with the understanding it doesn’t do anything).
  • Introducing mocked testing as you would do for a Rails app and off the back of this some refactoring of the views section
  • Replace the Django Template Loader with the Jinja2 equivalent

In terms of the domain problems we set out to solve on the weekend itself, there are two that didn’t get resolved. Firstly there is the Locality issue: who is near who. For the weekend I was going to simply hack it that if the first segment of the post code was the same then those people counted as being in the same area. I’d like to do something more elegant and in depth but that’s another post.

Secondly there is the “ratcheting” idea that was kind of fundamental but I’m not sure really got highlighted in the pitch. The idea is that the app should always be exhorting you to get more people to join and giving you milestones to go for. This means once you have identified all the people in a Locality you need to figure what Actions they are taking and how many people in each Locality are Devoted to the Action.

You then need to see if the Action has Tiers with higher Thresholds and tell the user to try and get their friends and neighbours to commit to the next Tier.

Once you have all the numbers of Ongoing Actions, Devotees and people in the Locality you can then start doing some awesome stuff like altering the postion of the Actions to promote Actions that are close to crossing the next Threshold and so on.

If all that doesn’t make sense then don’t worry because I probably need to explain the design somewhere.

Standard
Python, Web Applications

Django in 24 hours

Last Saturday afternoon I decided to learn Django. It was 2pm on the first day of SiCamp 2008 in London and being the only developer in the room at that point I decided that I should do whatever I felt would be the best option to get an application running by 2pm the next day.

Previously I have done some Google App Engine and the experience convinced me to give Django a go after I found myself, by intuition, creating a GAE project structure of handlers.py (views), models.py and a directory called templates that contained templates. I was then disappointed to find the whole world had got there before me.

So, Django in 24 hours, baptism of fire. What do I think now looking back on the experience?

Everyone has told me that the Django documentation is good and I think I have to concur. Not everything is so clear that when you’re speed reading in one window and typing in another it works first time but importantly nothing in the documentation is actually wrong. When stuff is not working a second, careful look at the documentation got me back on track.

Importantly Django’s core model of web development is sound and intuitive. My editor had around ten files open for the project and the flow of adding something to the application did naturally flow from url to handler to view to model. Maybe the only quibble I have is that the views.py file is deceptively named in MVC terms.

The core of the framework is amazingly concise, I spent the majority of my time thinking about the problem not about the framework API. Binding a URL to function made sense, having to specify a template instead of having one inferred from the method name was maybe my one criticism in the method handler but on the plus side it does allow for flexibility in handling requests. Handing off from one request handler to another was very easy.

Django templates are both amazing and annoying. The syntax and principles are amazing, it was easy to play around with the pages and the template inheritance was really powerful for avoiding duplication. However when I transferred the application from self-serving to mod_python the template generation was very wobbly when compiling changes from the file system. Of course this could also have been mod_python but it was the latest 3 series stable source compiled for the machine. I’ve used Jinja2 previously when generating HTML in Python and might be tempted to stick with it in future.

Django models are great, I hate ORM but I really liked syntax for defining persistence properties and I liked the way that you don’t have the fact that you are really dealing with SQL hidden away from you. It genuinely seemed a more convenient way of expressing the data model rather than an OO wallpaper over relational data storage. I didn’t feel the need to add domain logic to the models but I felt like it wasn’t really polluting the model to do that either.

One thing that didn’t work at all was changing the relations between models; it took me two or three attempts to finally model the relationships between the data concepts. Each time I changed a Foreign Key or Many to Many relationship I ended up deleting the database (SQLite3) as I couldn’t figure how to migrate from the old schema to the new.

One reason for choosing Django was the idea that I wouldn’t have to write the backend code as the admin stuff would be right there for me. It took me a while to get the 1.0 admin to fire up but once it was running it did perform as advertised. One of the attractive things about the application was that the data model followed the conceptual language of the solution in a really powerful way. You could use the admin interface to have a Devotee perform Devotion to an Action. My geek excitement peaked anyway, YMMV.

So them’s the highlights of the experience. Overall Django delivered me a rapid web development process in an intuitive, powerful way and lived up to nearly all of the claims made on the tin. Deploying to Apache/mod_python was painful but most of the pain surrounded the infrastructure of my box (multiple versions of Python, Apache config files) and my lack of mad Apache admin skillz.

I would happily tackle another project in it again.

Perhaps of interest is how the Django development experience matches up against Rails or GAE  which would have been the other obvious choices. GAE would have been very similar but the deployment would have been better and I wouldn’t have had any automatic admin. In retrospect it may have been a better choice for a hack party type event. It certainly would have been my choice for personal projects for easy of deployment but now I have one Django app running perhaps that isn’t as relevant any more. Certainly the thing that has kept me from GAE before, the pain of data migration, doesn’t seem that better in Django (except that you control the datastore and its contents).

Compared to Rails?

  • Admin is much more awesome than scaffolding.
  • Django ORM is much less complex than Active Record, all the data required to create, deploy and use the object is in one place. Django doesn’t have Migrations but has its own brand of database versioning pain.
  • RSpec is awesome (despite its monkey patching of Object) so you aren’t going to beat Rails for easy testing.
  • Django templates are more powerful and easier to use than Erb but you have a lot of Ruby templating options so it’s hard to make a complete comparision. They probably both similar in the sense that you can find a template library that suits your preferences. Django is the purely solution out of the box.
  • Routing and Controllers are much less involved in Django than Rails.
  • Django is less opinionated about how you structure your application directories, which I like.
  • Django doesn’t bake-in AJAX components but is “batteries included”, Rails probably generates better Web2.0 style apps for less effort.
  • Finally Django uses only a few code generators because its basic structure is far less involved. It also generates far less “stuff” for each MVC element which I quite like as I don’t tend to use everything Rails generates.

Okay detailed analysis over, what’s the high-level view? Django and Rails are similar experiences but I think the major differences between them are almost what you could say about Python and Ruby. In Django you are going to get simplicity, clarity and a real choice of how you plug your infrastructure components together. In Rails you are going to get magic up front which is cool but also magic at the back end, which is not cool. Ultimately I think the answer is how opinionated you like your software. Well punk? How opinionated do you like it?

Standard
Software

SiCamp 2008: Working with the Carbon Co-op

Okay so it’s Monday morning, I’m tired as hell and I’m kind of wondering what the point of spending my whole weekend at SiCamp was.

The weekend was really a game of two halves: Saturday was generally pretty good. I got there, learnt about the idea, pushed back on it, the group kicks around an idea that we thought we could get done in a weekend and then we set about making it. The team I was in was Carbon Co-op which is about using collective buying power to reduce the initial cost of buying and installing energy saving or renewable energy products.

Sunday though was a general fail fest for the team. On Sunday morning the team consisted of just four members, we had to beg for some help to get the webpages for the site done. Then during the pitch we had a complete fail, the AV system was screwed, the project sponsor seemed nervous about his pitch and there wasn’t enough time to switch laptops and show the site, which meant that all the time we had spent on it was completely wasted. About the only thing wasn’t a fail was the lunch, which was excellent and far beyond what I was expecting for this kind of thing.

The idea for the project is good and actually unlike a lot of the ideas at the event it had a genuine business model. However the team failed to communicate that or show any of the potential behind the concept. It failed to distil simple messages that could be quickly absorbed, in short: it failed to impress.

Part of the failure was not having gone to one of these events before and not knowing the format, what was expected and what you should be doing. So lets try and rectify that for the future.

Firstly, the event is advertised as an X-Camp, fully buzzword compliant. However for the talk of “self-organising”, it is actually a competition. Everything will boil down to 10 minutes in front of a panel of judges. For me an X-Camp should be able to decide the criteria for success, not the organisers; Camp Fail.

As you are in a competition: find out who the judges are. I did speak to the judges and what struck me was that they were for the most part concerned about business model and development. Innovation for them meant responding to changing circumstances in the economy. None of them seemed to care about the tech side of things except as an illustration of what the final product might look. It would have been more cost-effective for me to have hammered out some HTML and Javascript mocks of the site that could have been zipped up and put on the pitcher’s machine. All the working code I had was kind of a vanity project in the end (it is out there as open source though if you are interested). Misguided Effort Fail.

Another important aspect of the judge’s background is that every “mentor” and “adviser” that came through the project door gave really bad advice in the context of winning the competition. This year is you wanted to win SiCamp your project should have found something in the existing economy and done it better. Every team seemed to want to “disrupt” things and that message won no friends. If the advisers do not have the same background as the judges: don’t listen to them. If you think they will be helpful to the business: schedule a post-Camp meeting. Focus Fail.

Get a big team and assign roles quickly. I had assumed that everyone was in a similar boat in struggling to get enough people to cover the work (and indeed the winning team was quite small). However at least two of the teams had 10 to 15 people involved. You need people to run your blog, someone to create the presentation, someone to give the pitch, someone to facilitate and someone to project manage. Perform a skills audit quickly and sort out when everyone is available, don’t leave it to Saturday evening to ask whether people will be back tomorrow.

If you decide you are going to try and build a working site on the weekend then you will need a couple of developers with complimentary skills, some web designers (nothing fancy: HTML/Javascript/CSS will do but make sure its practical experience with some cross-browser experience), an infrastructure person (who can be shared with other teams), someone to generate content for the site and ideally someone to test the user experience.

You also need to have done some planning prior to the event. At the very least buy your hostname, buy some hosting, don’t be afraid to make a technology choice if it means your hosting is going to be simpler. Be responsible: Carbon Co-op, for example, requires people to submit an email and postcode, this means data protection issues. You simply can’t ask someone you randomly meet at an event to open their credit card, buy you a name, hosting and then start collecting people’s details. I know everyone at SiCamp is going to encourage you to do this but it’s a really terrible idea and this is meant to be your business not a final year student project. What happens if you fall out with random person after the event? How are you going to get your data back?

If after the skills audit you decide you don’t have a viable team then don’t just plough on. Take it on the chin that your idea hasn’t attracted enough interest and disband the team and go try make someone else’s idea awesome. Alternatively scale back to what you can achieve, which will usually be a really nice slideshow. Accept that a slideshow isn’t going to knock anyone dead when other teams will be launching websites.

Finally a practical point: don’t make videos for your project and presentation unless you are trying to make work for idle hands. Making videos is time-consuming and they don’t impress people. Actually getting someone who had never heard of the project before the weekend to come and talk during your presentation would have totally slain the judges. Think of people who video tape acceptance speeches, the undertone is: “Your event doesn’ t matter to me”. If you want to have someone speak who can’t physically get to the venue then use a video chat instead.

If you are a project founder then don’t be tempted to take an active role in the weekend’s work. Your role is to be the project visionary, don’t even be tempted to give the pitch yourself. Get someone else to give the pitch and then have them invite you to speak during the presentation. It’ll make you seem much more important and insightful. If you give yourself a role in the project then remember that all the time you spend on slides or talking to potential investors or managing tasks is time that you are unavailable to your team. Your team needs you to inspire them and make choices about what you want. This is your job.

What about the organisation of SiCamp itself? Well the venue was fantastic, the internet provision was first-rate, catering was excellent. Kudos on this.

Things that were not so good were the lack of facilitators and runners for teams. Each team should have had an experienced Camp hand to provide an idea of what the event was about, what was expected and when things were going wrong. This person should also have mediated who could visit the team rooms. They should also have compared notes with the other Team Camp hands to see whether all the teams were equally balanced.

Although in principle you were meant to be able to swap between teams and look into the other project rooms I’m not sure who the organisers thought was going to do the work for your team if you went off and have a wander round the other project rooms. I would have liked to seen the other team’s efforts and how they organised their teams but most of the weekend was spent looking at an editor on a MacBook screen.

When teams needed help or advice they should have been able to ask the facilitator to send a runner round the other team’s faciltators to find out if what they needed was in the other teams. There was a real confusion around whether people were competiting in teams or collaborating in a single event.

However my number one issue was the AV at the final show and tell. The sound didn’t work, the microphone kept cutting out due to low batteries  and the VGA connection to the OHP wasn’t working and everything projected purple. In short absolutely nothing worked and stress for our team was massive as we struggled to get something reasonable going.

My checklist of prep would involve: paying for a decent DVI projector with Mac compatibility, do a sound check prior to the event, allow the teams to run through 2 minutes of their pitches in the actual venue, sort out a running order ahead of time. In short, don’t make those 10 minutes more painful than they have to be.

So okay, praising and moaning over, do I think it was worth going? Well it was an interesting challenge and I wanted to see what could be done in two days. I’ve also got at least three blog posts out of it so I guess I learned a lot. My first thought on trying it again would be to assemble a team prior to the event so that you could be guaranteed the range of skills you need to really build something in 8 hours.

That’s really focussing on the competition aspect though and thinking outside the competition there are a lot of intangibles that the entrepreneurs involved gained. Some projects got new names, Carbon Co-op got a cool logo. In some ways assembling the shock troopers of project execution would mean that you were not really taking into account what people need to push their projects forward. You would also exclude some people from teams who could benefit from the experience of working in cross-discipline close-knit teams with immediate feedback loops.

I think I would want a more relaxed role next time with time to take in more of the event. I would also reign back from treating the event like a Hack Day with an emphasis on cool stuff working. Faking it before you make it is absolutely fine. To be honest I could also do with a 10:30am start on the weekend. With that in mind I would consider doing it again.

Standard
Java

Playing around with Neo4J and Groovy

After hearing about Neo4J at Ruby Manor I decided to have a play around with the graph database but for me playing doesn’t mean creating a whole Java project anymore. Since using Python/Ruby/Scala I want to be doing it in an interactive session.

I had quite a few issues getting Neo4J to run but the summary is that for convenience you want all three jars from the distribution in your classpath and you pass a String representing a directory path to the EmbeddedNeo constructor. Once you do have it running make sure to shutdown the database on an exception otherwise you will have to shutdown the JVM (i.e. close the whole Groovy Console session) to unlock the underlying file resources.

Okay so once you have the right jars you can now start playing around with Neo4J. I immediately felt that the library is actually quite heavy with a lot of ceremony to get things done. Some of the feedback I have been hearing from Neo Technology and the Neo4J list is that Neo4J is more of a low-level infrastructure component that is meant to be wrapped up in higher-level APIs.

Working with Groovy it should be possible to cut that ceremony down a bit and put a nicer front-end on things. The first thing I’ve tried is using closures to execute code in Neo database and transaction contexts.

If you have the three jars in your .groovy/lib directory you should be able to run this script from the Groovy Console and have it create a node in your directory. It will be the same node each time but I have some ideas for using builders for both nodes and traversers (which allow you to search the graphs) and I am going to work on (and post) them later.

Standard
Ruby

Ruby Manor

So first things first. Ruby Manor was a huge success and a real credit to James and Murray who created and ran it. Thank you guys very much for doing it.

The talks really varied a lot, and ironically I went away most interested in Rabbit MQ and Neo4J, both of which were only tangentially related to Ruby. George Palmer’s talk on Nanite was probably the most interesting of the day but during it I picked up a sense that the audience was mostly focussed on web development within a Rails space.

This reached some kind of nadir in Alex Maccaw’s frankly unreal implementation of recommendation analysis as a Rails plugin which really did look like someone hammering a square peg through a round hole. It did however generate some interesting post conference blogging.

I also enjoyed the Shoes and Monkeybars talk although they did end up confirming my reservations about both libraries. Monkeybars really seems quite a complex setup and kind of lays an MVC framework on top of Swing which already has an MVC like framework. However it does use Matisse which is still an amazing constraints based GUI builder. Shoes is awesome, until you get stuck – then there isn’t enough feedback to understand why your awesome has disappeared. Seriously fun to play with though.

So it has been proven you can have a cheap, community-led conference in London for Ruby. Is it now time for someone to step forward and organise the Python Snake Pit?

Standard
Programming

Toy Scala Static Webserver in less than 100 lines

Doh! The original link to the code was for the wrong file! Sorry. Corrected below.

Okay so there has been a craze for writing static webservers in new languages in less than 100 lines recently and I am not claiming that this code is anything special but I wanted to give to give the NetBeans Scala plugin a go (in NetBeans 6.5) so here’s my version of a Scala static webserver (in less than 100 lines, natch).

The code is more of a Toy at the moment as it assumes a very happy path. However it does work and it did provide some useful learning about Scala.

The good stuff includes: compact code, class imports, XML literals, map literal syntax and the NetBeans plugin does a good job of providing codealong feedback from the compiler.

Confusing stuff:

  • accessing array indexes with () not [], it makes sense but you have to get used to when coming from Java
  • implementing Java interfaces in Scala: still not totally sure how you do that
  • getting the right type to allow a Java API to be called: Array[Byte] seemed to take a long time to get right and not having type-coercion for Scala Lists to Java Lists means there is a anemic list variable in play
  • functions that have many parameters make for confusing IDE errors; do you want Int, Int, Int, Int, String, String or Int, Int, Int, Int?

And finally the bad:

  • the streaming IO for binary files is entirely imperative and is basically Java code, I’ve been told that Scalax can help put a prettier front on that
  • nothing to do with Scala but the in-built Java HttpServer should have had the public API in interfaces and there should have had an NIO-based HttpExchange
  • I.cant.stop.using.periods even.if.I.dont.have.to

Overall I am pretty happy with the quality of the final code and I feel I’m finding the balance of the language more and seeing Scala as more of an extension of Java than something entirely unique.

Standard
Java, Programming

Splitting Loot with Scala

Over the last couple of days I have been trying to implement the Sharing Problem (Ruby Quiz #65: Splitting the Loot) in Scala. So far I have implemented the greedy pick algorithim and still need to implement a recursive solution that will brute force the edge cases.

However on the way I think I have picked up some important lessons. You can see the code for yourself in the GitHub project related to the problem and the solution (it’s in the Scala directory) but I will be highlighting some of the code in this post.

This was the first set of Scala I’ve done that was actually an attempt to use some of the functional aspects of Scala rather than just porting Java code in a more or less literal and imperative way. After struggling a little bit with the implict return and list concatenation I implemented the greedy heuristic. This is what it looked like (link to the code file).

import gems._

package splitter {

  class LootSplitter

  object LootSplitter {
    def splitLoot(gembag: GemBag, shares: Int): List[GemBag] = {

      if((gembag.totalValue % shares) != 0) {
        return List[GemBag]()
      }

      val individualShareValue = gembag.totalValue / shares

      var partShares: List[GemBag] = List.make(shares, new GemBag(List()))

      gembag.gems.sort(_.value > _.value).foreach((gem: Gem) => {
        partShares = partShares.sort(_.totalValue < _.totalValue)
        partShares = List(partShares(0) add gem) ::: (partShares drop 1)
      })

      partShares
    }

    }
  }
}

I was struck initially by how compact this code was, you are looking at about 28 lines of code. However as I was looking at it I began to wonder whether it was concise or just terse. How is someone meant to interpret something like _.value > _.value? I showed it to a collegue and his first reaction was “I don’t know functional languages so I wouldn’t know what this means”.

This was exactly the kind of reaction I was afraid of because I have been converted heavily to the principal of readable code. Someone should be able to scan code and understand, in principle, what is happening here. If they don’t then the cost of maintaining that code is going to be higher and we have actually lost something in the concise syntax.

I decided to try and implement a readable version of the same file which added about 6 lines (only two according to GitHub!). You can read this version here but now I want to throw it open to the public. Is this version actually easier to read? Are things like the underscore variable actually part of the price of comprehending Scala?

In my rewritten version I use some of the nice features of Scala such as first order functions but you still have lines like this:

List(sortedBags(0) add gem) ::: (sortedBags drop 1)

I hope you are reading this as “add a gem to the first sortedBag and make a List of it and then add all the other bags except the first one to the new list” but I am worried that this is far from obvious. Is that because I’ve done something that isn’t idiomatic or is it because actually the operators and the library API are too obscure?

Scala represents a significant evolution of Java in terms of absorbing all the lessons learned during the evolution of the language. When porting Java code I feel far more comfortable with it than when I am trying to create new code that uses the core language libraries. I don’t want to evolve to a new set of problems and best practices that try to avoid them.

Standard
Java, Web Applications, Work

Better, Faster, Weaker

I went to the Developers and Startups Meetup last week (look it up on Meetup if you’re interested it was a good event). One thing that struck me was that all the entrepreneurs were mostly interested in PHP and Ruby on Rails (there was a significant rump of ASP and .Net which I can only put down to C#’s language hotness at the moment). Java in particular was poorly regarded.

I was intrigued by what was causing the negative vibe as while I can understand that Ruby on Rails is the new tech hotness I couldn’t really see the appeal of PHP over something like Java that has a decent stack that is practically free. Talking to a few people I got the sense that PHP was seen as a “getting things done” language. Something that got you to a tangible real product very quickly. Rails seemed to have much the same quality but I think a lot of people felt that Rails skills were more expensive and in a lot of cases they wanted a thin web layer over an existing service backend that already existed.

Of course both PHP, Rails and the Java Web Stack all make various compromises to do what they do. Where Rails is opinionated, Java probably isn’t opinionated enough. If you put all your code into page (Model 1 stylee) then I’m pretty certain that writing JSPs in J2EE 5 is probably as quick to put something together.

Of course one thing that hugely hampers Java as a productive web stack is the compile and deploy loop. There is none of the immediacy of something like RoR that allows you to make changes very quickly in response to feedback. The tradeoff is of course that the code needs to be interpreted and you have to add some kind of caching to avoid the performance hit of always having to evaluate code.

But given that there are a group of people who need to have code quickly so they can sell something to their customers or investors and that they are willing to make all kinds of compromises about scale and maintainance costs is the Java platform really out of the question? I ask this because I think I have been involved in quite responsive Java web teams where feature changes have been matters of days not weeks or months.

I think that perhaps the issue is partially just mental. Enterprise software development tends to take place in a risk-adverse environment where certainty or process is valued over rapid delivery. “Best practice” in this environment tends to sacrifice timeliness. In the smaller business space this might well be a practice turned into dogma.

That said though some Java Frameworks introduce major barriers to just getting stuff done. Any framework that forces you to produce XML based interaction flow is probably distracting you from generating functionality that people want and will pay for. Routing isn’t a trivial part of an application but that’s all the more reason to box clever with it. Any framework that doesn’t make it simple to inject services into classes is making life too hard. Ditto any framework that forces you to work with a particular Expression Language or Templating product.

I am wondering if what I am looking for is actually the weakest Java framework. Rather than having functionality for all circumstances (and having to configure and setup it all up) I want a very basic REST framework which I can then add templating and persistence according to my need. It still wouldn’t have the nice scaffolding of RoR but it would mean that each element of the web app would be absolutely vital to its function. Weakness would generate the power of vitality and reduce the obfuscation of bloat.

Standard
Web Applications

Important! !important is a danger sign

Until recently I had never seen the CSS keyword !important used in a production site. However just recently I have seen it in use and also had to use it myself to fix a few cascade issues.

CSS selectors work by assigning a “magic number” that indicates how specific the selector is in relation to the other selectors. Important works by boosting that number by a magnitude or ignoring all other selectors entirely. You can read the exact rules in the specification.

Important is really powerful and as a result you never want to use it. It’s kind of like the CSS A-Bomb, if you ever have to use it, something has gone wrong. The biggest problem with !important is that it can “lock” a style element and make it hard to override it in other cascades. Inevitably this becomes a problem because there is pretty much nothing in CSS that can be regarded as universal in the appearance and rendering of a website.

This then leads to other stylesheets also using !important in their selectors to overcome the earlier !important. This limits their reuse as they now, in turn, export their overly powerful rules and therefore require yet more !important use and so on and so on until every selector has !important on it.

Stylesheets should try to have as weak as possible selectors (without going overboard and perhaps applying some styling information too liberally). This makes them more generally useful as often people only dislike a few elements in a style or an individual page only has a few components that do not gel well with the general style.

I think !important should never be used when creating CSS. There are perhaps two exceptions I can think of: firstly client styles, you know best, fill your boots; secondly, stylesheets that you know represent the real bottom of a cascade. For example an optional stylesheet that renders the site in monochrome can reasonably be expected to represent the final word in a cascade.

Standard
Software

Nulls mean nothing, or something…

My last post touched on something that is a real issue for me. Using Null in a Relational Database, now I’m not Fabian Pascal but I have had enough problems with Nulls to feel that if you are going to use a Relational Database it is well worth making the effort to eliminate Nulls from your data model.

You really haven’t suffered until you’ve had to add a Decode in a query to add a layer of logic to a column which really never should have been Null in the first place. And then there is learning the special logic for Nulls usually when your predicate has failed to evaluate as expected. Could there be anything finer than saying X = Y OR X IS NULL?

However usually my problems with nulls come down to the following:

(,,,3,)

(,2,3,,5) intersection (2, 5)

What do these things mean? They make my head explode in data terms. They don’t really make senses as tuples, vectors or indeed anything other than lists. In the case of lists you can charitably interpret the empty values as being positions that have no assigned value.

Null in programming languages usually has a strong definition. In Java it means that a reference has no associated memory allocation, in Ruby it’s a type of object. In databases people always struggle to say what NULL is.

In the comments on the previous blog post the commentor came up with at least three interpretations of a Null value: unknown, unset and Not A Number (NAN). However really Null is an absence of information, it’s kind of the anti-fact. It’s presence tells you nothing and as soon as you try and interpret it the number of potential meanings multiples exponentially.

Trying to store NAN as a concept in a database is just misguided. If I can say a NUMERIC field is, in fact, NAN then why can’t I store the idea that a Number is not a Lemon? Or a roadsign? If it is really not a number, why are you trying to store it in a numeric field? If it’s the result of a calculation that is either not storable within the allocated storage or infinity then why don’t you store that information some other way?

Relational databases work best without nulls and usually the data model works best if it is seen as a collection of facts, that can be stored in a normalised form with strong relations between the sets of facts.

However the way they often get used is a surrogate (and usually underperforming) hash stores where each entry consists of a set of keys that may or may not have values. This is great for object serialisation and instead of relational queries you can introduce highly parallelisable search algorithms. However it kind of sucks at everything else. Firstly because the usual RDBMS like MySql or Oracle isn’t expecting to be used as a Hash Store so it is doing a lot of things that you aren’t going to need in this model. Secondly because the irregular nature of the hashed data means that trying to get sets of rows back out of the database can underperform because the system is often forced to brute force examine the data in a serial fashion.

The whole point of creating tuple arthimetic is so that you can optimise for relational processing and query big sets of data quickly. Completely ignoring it or, worse still, crippling it so that serialising objects is easy is like shooting yourself in the foot… with a shotgun… for no reason.

Standard