Web Applications, Work

The myth of “published” content

Working at the Guardian you often end up having conversations with people about the challenges you face in scaling to meet the often spiky traffic you get in online media. One thing that comes up again and again is the idea that content, once published is essentially static. Now there is a lot to be said for this as digital journalism sticks pretty close to a lot of the conventions of print media; copy is often culled from the print version and follows the 24 hour media cycle quite strongly.

However what is often surprising is the amount of edits a piece of content receives, particularly if it is not a print feature article. The initial version of an article is often the mandatory information and a few paragraphs sufficient to get across the basic story. It then goes through a number of revisions that often happen while the article is draft. Often but not always.

Once the article gets published online though it triggers a new wave of edits as language gets cleaned up and readers, editors and lawyers all descend on it. Editors now have a lot more tools to see what the reaction of the audience to a piece of content is and see how it is playing in social media. You also have articles picked up externally and that means making sure the article works as a landing page.

Naturally stories often develop their own momentum that requires you to switch from a single piece to a set of stories that are approaching different aspects of the overall reporting. You then need to link the different pieces of content together to form a logic package of content.

One thing that is interesting is looking at how many articles are changed after seven days. It is a surprising number as new stories often create a need to create a historic context and often historical stories look dusty in the light of breaking events. We have also had strange things happen with social news where aggregating sites pick up some story that was overlooked at the time.

All of this means that you cannot naively treat content as static but in fact means that you have an interesting decaching problem as it is true that content doesn’t change much, until it does start changing and then it needs to reflect the changes reasonably rapidly if you want to be picked up by things like Google.

Web Applications

The web is a graph

Last week I gave a talk on how I have been creating web applications that very lightly wrap an underlying graph to provide not just content for a page but also the workflow and state of the user’s current interaction with the application.

As part of the talk I have created two demo apps that are available on Heroku. Crumbly Castle is inspired by Dark/Demon Souls and allows you to explore a castle that is populated by the ghosts of everyone who has ever played it. The other offers a questionnaire system that generates characters in the style of the Elder Scroll or Fallout games. The code for the applications is on Github so you can fork it and deploy it for yourself. Both use the hosted Neo4J addon for Heroku which provides hassle-free hosting but is currently only available to beta program members.

You can obviously use both on your local machine.

Both of the demos are metaphors for more serious kinds of enterprise applications but I think it is often easier to produce prototypes or demos that are based on immediately engaging concepts. It certainly helps to have something that the audience can play with during the talk!

So briefly I just wanted to summarise the points I try to make during the talk and explain why you might want to look at using a graph as your web application store. So my major point is that web application development is usually page-centric, when you hit a page the controller tends to examine the whole state of the application to find out why you came to the page. Are you logged in? Were you trying to look at something? Is there a session associated with you?

I posit that we should instead be looking at the journeys between the pages as being the interesting things. Given where you are in the journey graph where can you go next? Essentially I am taking the same logic as a state machine or rule engine uses and instead expressing it as a relationships in a graph.

The most common trick the applications use is to assign a fixed url to a user session that identifies a node in the graph. Then with each transition I change the relationships the node has to other data based on the user’s actions and then simply send a redirect back to the fixed url which will then render a different result based on the current state of the node.

This means that the web application becomes very simple to write and the controller simply has to select the template and the related nodes that are needed to generate links and actions.

I think it is a really interesting approach that is a really natural fit for simplifying a lot of session-state heavy apps.

Web Applications

Good magic, bad magic

Philip Potter pinged me his post on Sinatra magic during the week. Mark Needham’s comment and code on solving the mocking problem is good advice to the problem as posed.

At Wazoku where we use the often equally magical Bottle framework we don’t use top-down TDD but instead outside-in functional tests (with no funky runners as we don’t need CI). This solves the whole magic issue by shifting the attention to what the public interactions of the application are. This is one of the massive benefits of using a microapp HTTP/JSON/REST-like architecture. I could flip the API from Bottle to Django or Compojure or Sinatra and my test suite can keep on rocking and telling me whether the behaviour my consumers are relying on is correct.

The major thing I felt when reading through Philip’s post was the massive amount of effort that was going into testing relatively simple behaviour. This is a bit of anti-pattern with Agile developers (or perhaps it is part of the mastery thing where rote “correct” behaviour is modified by experience and judgement). One of the massive advantages of using something like Sinatra is that you can get a whole web app with rich behaviour into less than 200 lines. If you then create thousands of lines of test code and battle with the magic for hours on end you’ve completely destroyed your productivity.

If you have a code base that you expect to be large and highly contested by a large development team you need good, layered testing and to use frameworks that support that. If you have an app that is small and when its done it is done then there is no need to agonise as to whether it was done “right”.

The idea that top-down TDD is the only correct way to write software is corrosive. When faced with a generally poorly skilled and educated workforce it is good to have rules. I have imposed a certain style of TDD on a group myself because it gives a good framework for work and achieves very consistent output.

However with skilled people on small scale projects you can kill yourself by imposing arbitrary rules. I love Sinatra and while I might be equivocal about magic I think it is ridiculous to moan about it if you are using something as unicorn-packed as Ruby. For example Philip was trying to use RSpec mocks and stubs to do his TDD. The result is kind of saying that you’re disappointed that your “good” magic for testing didn’t work with the “bad” magic of a DSL for web applications. Even if your RSpec code passed its tests you still haven’t said anything particularly deep about the production behaviour of your application as your unit testing environment was severely compromised by the manipulations of your mocking framework.

So my rule of thumb is: if its simple, do it; if it was simple, functionally test it; if it was never really simple then test-drive it with suitable tools.

Programming, Software, Web Applications, Work

Names are like genders

One thing I slightly regret in the data modelling that is done for users in Wazoku is that I bowed to marketing pressure and “conventional wisdom” and created a pair of first and last name fields. If gender is a text field then how much more so is the unique indicator of identity that is a name?

The primary driver for the split was so that email communications could start “Hey Joe” rather than “Hey Joe Porridge Oats McGyvarri-Billy-Spaulding”. Interestingly as it turns out this is definitely the minority usage case and 95% of the time we actually put our fields back together to form a single string because we are displaying the name to someone other than the user. It would have been much easier to have a single name field and then extract the first “word” from the string for the rare case that we want to try and informally greet the user.

My more general lesson is that wherever I (or we more generally as a business) have tried to pre-empt the structure of a data entity we have generally gotten it wrong, however so far we have not had to turn a free text field into a stricter structure.

Python, Web Applications

Deploying Python apps to Epio

I recently got my beta access to ep.io, the Python application deployment platform. I had the chance today to have a play around and try out some deployments so I thought I would try and give my view on the experience before. I’ve deployed Python apps to Heroku and Gondor before so those services form my reference points here.

So firstly, there’s a command-line client that you install via pip and you effectively deploy to the platform via a client-command, SSH keys and what looks like git on the server-side. This is more like Gondor than Heroku (which is intimately linked to git). It means you have your choice of source control and if you want to be a Python purist you never need to step outside of Python for everything you are doing.

Applications consist of essentially one configuration file that states where the WSGI application is and what the requirements file is. Compared to Gondor it is a very simple setup but it did feel that it could be even simpler if it made convention-based assumptions such as the requirements file being called requirements.txt, for example.

Leveraging WSGI and configuration this way gives a very flexible platform and I was able to get both Flask and Bottle to work (the former very quickly because it has documentation, the latter via trial and error that might require its own blog-post). I didn’t have time to try Django but I felt pretty confident that I could get whatever framework I wanted working once I understood the basic setup.

Unlike Heroku, Epio provides a fixed framework for executing the apps. It seems you will be running behind NGINX and Gunicorn. Both are good choices and I certainly like them but if you want to play around with different servers like Tornado or CherryPy you may prefer Heroku’s more open deployment model. I did like the way that you can use the configuration file to have NGINX serve static content directly.

Epio naturally has less of an ecosystem than Heroku but has Solr, Postgres and Redis out of the box. All solid choices and covering off the majority of what I would need. I was certainly grateful that I didn’t have to grapple with remote database administration and could prototype apps with just Redis.

Deployment and logging have kind of rough edges. Being able to access logs directly from the application page was a win for me, however when I was struggling to define the WSGI entrypoint correctly it seemed as if the application wasn’t being really compiled until the first request comes in. I would see an entry confirming a new deployment but then nothing until I hit the app. I think there should be some kind of sanity check of what you have uploaded to see whether it will even run.

Right now epio is providing a Python-based cloud deployment platform with a sensible set of supplementary services and low opinion about the source control system to you use. It feels like if this had been around at the start of the year it would have blown me away. However now there is more competition and therefore questions of price and ease of use will matter in terms of how compelling it is to use the service.

If you do Python web development I would definitely recommend you sign up for beta and give it a go yourself as it seems a very solid prototyping platform. If you are not a Ruby and Git fan then you may well love what is on offer here because it is already very convenient, makes few demands on you and gets your web app public in minutes.

Web Applications, Work

Using SVG in the modern website

Using SVG when you are putting together a new website is a pretty sound decision, it’s over a decade old, well-supported by browsers and the ability to scale images accurately via CSS is pretty compelling when you are rapidly trying out different layouts and proportions.

Of course until recently IE has been the bugbear but IE9 actually has pretty decent SVG support. It is now worth thinking of using SVG as the general case and IE8 as the exception which can be switched to PNG via Javascript. The first iteration of Wazoku Idea Spotlight used SVG exclusively and the second iteration will do a Modernizer based switchout for IE8 but essentially still be SVG based.

Therefore I was pretty confused when I was taking a random check at the app in IE9. Instead of displaying alt text or the images instead there was just whitespace. Quickly opening the images revealed that IE was quite happy to render them at full window size and that there was no issue with loading them.

After some confused Googling I found out that the issue was that the previous generation of SVGs were generated straight out of Adobe Illustrator where as this set are going through Inkscape where I am tweaking the colour, size and so on. Inkscape does not allow you by default to specify a property called the viewbox. Instead this is only created if you export your file as an Optimized or Plain SVG. It is an outstanding feature when you go looking through the Inkscape bug list but it is a really obscure bug (hence this blog) to track down. The reason the images were appearing as blank is that without a viewbox IE9 crops the image to the CSS dimensions rather than scaling it. Firefox and Chrome scale it as you would expect. Essentially I was seeing the top-left 32 pixels of an image that IE9 considered to be 640px square, overflow hidden.

Having found the problem I then converted a test image to Optimized SVG, who doesn’t love Optimized things after all? Well the answer to that is Chrome. Firefox (probably due to having the longest SVG heritage) did the right thing in both cases and IE9 was fine with the Optimized version. Chrome stretched the image out on the vertical and via the Developer tools it was possible to see that the Dimensions value for the image was completely incorrect with a letterbox set of dimensions rather than a square.

In the end the thing that worked everywhere was Inkscape’s Plain SVG format. Something I am fine to live with. It would be nice to be able to set a viewbox from Inkscape’s Document Properties though and I will be keeping an eye out for it on the release notes in future.

Web Applications

Replicating data in Cloudant and Heroku

Heroku allows you to use CouchDB via the Cloudant cloud service which is great but compared to the documentation for the relational stores it is not clear how you are meant to deal with backups and importing of data. I also couldn’t find a way to use Futon on the Heroku instance (which comes from the Heroku account, you can’t use your own Cloudant account with the plugin) or share the database instance with my personal Cloudant account.

This post from Cloudant helps a lot, essentially you can get your Heroku instance URL and then the cool thing about Couch’s painless replication is that once you have a Couch URL you can replicate that database to a local instance or even back into Cloudant.

heroku config --long

curl CLOUDANT_URL/_replicate -H 'Content-Type: application/json' -d '{"source" : "CLOUDANT_URL", "target" : "TARGET_URL"}'

You can edit the database locally and then replicate back to the Heroku instance by just swapping the URLs in the Curl above.

That seems to pretty much be it. I’ve replicated my data out of Cloudant and then back into it, which feels bizarre but it’s all symmetrical with Couch and it’s a handy cloud-based backup mechanism.

Programming, Web Applications

Can you use NoSql?

I think the answer is yes. The reason is that traditionally relational datastores have ended up as being the dumping ground for data. Everything has ended up there and with the advent of new data storage technology there is a chance to rummage around the various piles of data and ask whether things are in the right home or not.

One thing I’ve been doing a lot recently is data-driving HTML Form components. That’s a lot easier when you are just reading the data out of documents and lists rather than out of tables. The first advantage is that you don’t have to size your option text for example. Variable text labels? No problem. The second is that you can move away from numeric values to having text-based slug keys or even use existing conventions like ISO language short codes.

You don’t have to use numbers with relational data of course but it tends to happen due to leaky ORM solutions that are orientated around the Long Primary Key.

Another area where you can probably take advantage of a NoSql store is in the small bits of text that occurs around your site but which should be maintained by business owners rather than the front-end team. Thing of those straplines, boxed text and success stories. Maybe they are stored in CLOBs somewhere in the database perhaps in a table called something cryptic like user_text. Let’s liberate that data into a key-store!

I find myself using a lot of Textile and Markdown text in my sites and it is an almost trivial exercise to process and display it from a NoSql database. I would encourage you to give it a go, it’s low risk but it should illustrate some of the benefits of the new stores and suggest the kinds of other problems you have in your application that some NoSql could solve.

Python, Web Applications, Work

Declare for simplicity

I was struggling with an issue today on a template that was blowing up due to missing keys in a template data hash. At first I tried to write some conditional code in the template. That ended up being quite ugly so then I was trying to find a way to condition the template on whether the key was present.

Then it struck me that the issue was really the missing key. As the hash is prepared in Python code the original implementation frugally avoided creating the entry if the underlying data wasn’t present. While this is clever and minimal it actually just pushes the complexity out of the Presenter logic and into the template where it is much worse because you can only interact with the Presentation’s abstraction of the original data.

The problem here is that dynamic languages sometimes allow you to be too clever. In a declared structure system you have to declare all your data and provide sensible defaults. This makes writing the templating solution at lot easier as you never have a missing data problem to deal with (you still have headaches with duff defaults but that is a different post).

So I went back to the presenter and declared an empty list under the key that had been causing me grief. Bingo! My failure went away and the default behaviour of not generating any content for the empty list kicked off, reducing my template back to the unconditional single-line I had originally.

Web Applications

The Browser or the App?

There is an interesting little issue occurring in web development at the moment and it is all being caused by the rise in mobile browsing.

The mobile device has always been a bit a challenge (anyone remember WAP?) but until the iPhone it was an issue that was pretty irrelevant. Web browsing on a phone was so painful that no-one did it. Right now if you check your logs most sites are only going to have a tiny amount of iPhone traffic. However if you live in a major city and you own a phone with a decent screen and browser then you are probably aware of how quickly you start to expect to have access to the same level of information you have at home when on the move. Why should I know more about delayed trains in bed than on the platform?

So the technology is finally reaching the point where the consumer is starting to expect to be able to browse the web on mobile devices. In parallel there is the rise of the app. The prime advantage of the app over mobile web browsing is that the app can sensibly cache data locally and therefore provide a degree of offline tolerant behaviour. You can also simplify some of the UI if your user interacts with your site instead of just consuming the content.

So what do you do? Do you invest in creating a mobile version of your site or instead create an app? Obviously if you can afford it you may want to pursue a web/mobile web/iPhone/Android strategy, i.e. do everything. For most companies though the practicalities lean towards trying to have a main website that more or less works with modern mobile browsers (mostly the iPhone). There are a lot of things that go into that like page load but in essence is a “code and hope” strategy where you hammer any nails that stick up after the event.

What’s interesting about this strategy is that HTML5 has some offline capabilities that allow you to provide some offline capability as long as you as you are happy to ignore the non-Opera/Safari/Chrome users (and frankly why not). This means that can stick to pure web development and have a reasonable mobile experience. Adding a JSON API to your site’s content also opens you up to third-parties developing apps for you.

So what kind of reasons should drive you to develop an app? The first reason is really the customised rich user experience, an app should radically simplify accessing your content. For example for a timetable site a mobile app can make it easier to enter and refine queries, defining search options on mobile web tends to be a pain and you often want to save and reuse search settings rather than re-enter them. Sites with user-generated content may also want to have a rich UI to encourage the user to keep supplying material.

An app should also always be trying to cache content when online and pre-empt the user’s needs. For example it makes sense to try and download timetable information and travel updates for locations I use frequently in my search. If I open my timetable search app and the first thing I see is whether there are delays on my route home are and when the next bus or train is going to be then I may not need to do anything else.

Apps also open sophisticated location service options, although location has been broadened to web browsers too your ability to respond to location based information is very limited on your home PC. A reviews website for example has a strong incentive to invest in an app so it can supply location-based reviews.

It is not clear at the moment whether it is more important to develop a mobile web or a mobile application capability. The emergence of capable, enjoyable web browsing on handhelds in an important development, there are many more mobiles than computers. Thinking about how your web content works on those devices is suddenly very relevant. Developing a mobile-capable site is a good defensive strategy but for some businesses being able to enter the mobile app market earlier is going to be much more important as it could potentially form an audience for them that is greater than their current web users.

Echo One

Sequentially arranged sentences composed of words (and punctuation)

Category Archives: Web Applications

The myth of “published” content

The web is a graph

Good magic, bad magic

Names are like genders

Deploying Python apps to Epio

Using SVG in the modern website

Replicating data in Cloudant and Heroku

Can you use NoSql?

Declare for simplicity

The Browser or the App?