Web Applications, Work

The myth of “published” content

Working at the Guardian you often end up having conversations with people about the challenges you face in scaling to meet the often spiky traffic you get in online media. One thing that comes up again and again is the idea that content, once published is essentially static. Now there is a lot to be said for this as digital journalism sticks pretty close to a lot of the conventions of print media; copy is often culled from the print version and follows the 24 hour media cycle quite strongly.

However what is often surprising is the amount of edits a piece of content receives, particularly if it is not a print feature article. The initial version of an article is often the mandatory information and a few paragraphs sufficient to get across the basic story. It then goes through a number of revisions that often happen while the article is draft. Often but not always.

Once the article gets published online though it triggers a new wave of edits as language gets cleaned up and readers, editors and lawyers all descend on it. Editors now have a lot more tools to see what the reaction of the audience to a piece of content is and see how it is playing in social media. You also have articles picked up externally and that means making sure the article works as a landing page.

Naturally stories often develop their own momentum that requires you to switch from a single piece to a set of stories that are approaching different aspects of the overall reporting. You then need to link the different pieces of content together to form a logic package of content.

One thing that is interesting is looking at how many articles are changed after seven days. It is a surprising number as new stories often create a need to create a historic context and often historical stories look dusty in the light of breaking events. We have also had strange things happen with social news where aggregating sites pick up some story that was overlooked at the time.

All of this means that you cannot naively treat content as static but in fact means that you have an interesting decaching problem as it is true that content doesn’t change much, until it does start changing and then it needs to reflect the changes reasonably rapidly if you want to be picked up by things like Google.

 

Standard
Java

EuroClojure 2012 Day 1

So there were definitely two big themes in the talks on the first day of the conference.

The first has been about how to use event-based systems to create flexible aggregate data models. All speakers seem to have settled on a reduce or foldLeft approach for creating the aggregate but there have been two models put forward already CQRS and a kind of Aggregate query bus but really it seems that responsibility for accepting event data and allow querying and access to aggregated views seem to be responsibilities in the same system.

The other thing has been creating query systems using logical predicates. The were no less than three generic query systems put forward: core.logic for low-level flexible implementations that identify either data or results and two general query libraries: one from Datomic and the other from Cascalog.

Standard
Web Applications

Good magic, bad magic

Philip Potter pinged me his post on Sinatra magic during the week. Mark Needham’s comment and code on solving the mocking problem is good advice to the problem as posed.

At Wazoku where we use the often equally magical Bottle framework we don’t use top-down TDD but instead outside-in functional tests (with no funky runners as we don’t need CI). This solves the whole magic issue by shifting the attention to what the public interactions of the application are. This is one of the massive benefits of using a microapp HTTP/JSON/REST-like architecture. I could flip the API from Bottle to Django or Compojure or Sinatra and my test suite can keep on rocking and telling me whether the behaviour my consumers are relying on is correct.

The major thing I felt when reading through Philip’s post was the massive amount of effort that was going into testing relatively simple behaviour. This is a bit of anti-pattern with Agile developers (or perhaps it is part of the mastery thing where rote “correct” behaviour is modified by experience and judgement). One of the massive advantages of using something like Sinatra is that you can get a whole web app with rich behaviour into less than 200 lines. If you then create thousands of lines of test code and battle with the magic for hours on end you’ve completely destroyed your productivity.

If you have a code base that you expect to be large and highly contested by a large development team you need good, layered testing and to use frameworks that support that. If you have an app that is small and when its done it is done then there is no need to agonise as to whether it was done “right”.

The idea that top-down TDD is the only correct way to write software is corrosive. When faced with a generally poorly skilled and educated workforce it is good to have rules. I have imposed a certain style of TDD on a group myself because it gives a good framework for work and achieves very consistent output.

However with skilled people on small scale projects you can kill yourself by imposing arbitrary rules. I love Sinatra and while I might be equivocal about magic I think it is ridiculous to moan about it if you are using something as unicorn-packed as Ruby. For example Philip was trying to use RSpec mocks and stubs to do his TDD. The result is kind of saying that you’re disappointed that your “good” magic for testing didn’t work with the “bad” magic of a DSL for web applications. Even if your RSpec code passed its tests you still haven’t said anything particularly deep about the production behaviour of your application as your unit testing environment was severely compromised by the manipulations of your mocking framework.

So my rule of thumb is: if its simple, do it; if it was simple, functionally test it; if it was never really simple then test-drive it with suitable tools.

Standard
Java, Programming

Python as a post-Java language

I’m a UNIX-based developer and since 2000 I have been working mainly with Java and then JVM languages. When Java 7 slipped I made no real secret of the fact that Java was in a lot of trouble. The post-Oracle world though looks even worse with a lack of clarity of what in the core ecosystem is free, open source and liability free.

Clojure and to a less extent Scala are great steps forward so I don’t feel the burning need for a Java 7/8 whatever. However a moribund or tainted JVM is a major problem and so I’m now thinking about what the post-Java escape route looks like. On the web front it is pretty obvious, Python and Ruby are great languages with great frameworks for developing web-based application. For the server-side heavy lifting it is a lot less clear, people are talking about Google Go but that does feel quite low-level, I’m not sure I’m ready to go back to pointer wrangling even with memory-management. It feels like something you’d build a tool out of not an application. Mono feels like more of the same problems of wrestling with big companies with vested interests, if you are going to do that then why not try and sort out the OpenJDK?

As the title of the post suggests the language I am most inclined towards right now is Python. It is a really concise but clear language that on UNIX systems comes with an amazingly comprehensive set of libraries and which has a virtual environment and dependency management that is on a par with RVM and gem.

The single issue that comes up is performance, what I have been finding that for 80% of the work I am doing performance is okay and I’m producing a fraction of the code I would normally have to create. For that last 20% maybe I am going to have to look at something like Go or (god forbid) go back to C but I would much prefer to see a Clojure or Scala that could run on top of something like LLVM. I also have some hope of smarter people than me making progress on a JIT for Python that might take 20% down to a figure where performance would matter so much to me I wouldn’t mind sweating to make it happen.

Standard
Java, Ruby, Web Applications, Work

Working with Rails

I have recently struggled to try and get a prototype web application done with Ruby on Rails. It has been a really great experience and like all such experiences consists of both good and bad moments. Things that are particularly striking coming from a Java Web environment:

  • Integrated console
  • No recompile/build/deploy cycle
  • No mix and match components
  • No real programming language skill required.

It is also remarkable how quickly people revert to the level of thought, planning and execution that you might use for a shell script. Even I have found myself getting caught up in “playing” with the app interactively rather than being focussed on creating behaviour and functionality in a structured way.

Elaborating on these points; in Java web development the first issue is usually building up your component stack. Some people just choose SpringĀ  but those people are idjits! If you are working purely in a web tier without any need for any backend interaction then it is probably a good bet but generally you want to put some thought into how you are going to assemble your various stacks. You tend to have to choose your MVC web framework, your persistence framework and your service framework (if appropriate). You often have to give some thought to your caching and messaging frameworks if they are relevant.

With Rails you just use the appropriate Active Module or Rails built-in features. If there is something that is going to ease some pain for you then it is going be operating as a plugin. That’s it, no framework holy war. It also means that there are a lot of applications that are just not going to be suitable for a Rails application as all those Java components have various strengths for different environments. I certainly wouldn’t want to tackle a legacy database with ActiveRecord for example. Not that it couldn’t be done but I wouldn’t want to do it.

The zero-turnaround is impressive after the build-deploy-check cycle. It was a wow factor last year when I saw the Phobos framework being demonstrated by Sun and its not lost any of its shine. By separating its deployment environments so ruthlessly Rails is able to deliver a really positive developer experience.

The interactive console takes a bit of getting used to in terms of faking browser requests but once you get used to doing so it is another tool that you wonder how you’ve managed without for so long. It’s much more intuitive to use than setting up a remote Java debugging instance (although admittedly you do similar tasks in both). It allows you to scratch those itchy “why?” questions.

And finally that lack of programming skill? A bit controversial perhaps? Well I feel that in recent weeks what I have been learning is how to manipulate Rails. The fact that it is implemented on Ruby may allow a lot of features to be implemented in the way they that they are but you are very rarely called upon to show Ruby madskillz. Instead the majority of the time you are simply plugging little customisations into the Rails framework. It is called Rails for a reason after all, when you are on them you are amazingly productive but if you can’t package your problem into a Rails solution then you are out of luck, you’re going to have to develop your own solution and that is going to be hard work. I have been trying to develop my Ruby skills (that is a whole other story) but the truth is you can bang out something that is acceptable with very little Ruby knowledge. You can go a lot of the way purely by mastering Ruby’s hash syntax.

So have I been converted? Well for all its problems I do have a hankering to get back to my great big Java applications with their holy wars and heavyweight processing. After all being on the Rails is fine for getting things done quickly but it can feel claustrophobic. I am also really glad that after getting distracted by the whole JSF controversy the challenge Rails presents to the status quo of web development means that a lot of the Java frameworks are starting to respond to the real problems faced in web development and ensuring that the easy stuff should be easy.

Standard