Working at the Guardian you often end up having conversations with people about the challenges you face in scaling to meet the often spiky traffic you get in online media. One thing that comes up again and again is the idea that content, once published is essentially static. Now there is a lot to be said for this as digital journalism sticks pretty close to a lot of the conventions of print media; copy is often culled from the print version and follows the 24 hour media cycle quite strongly.
However what is often surprising is the amount of edits a piece of content receives, particularly if it is not a print feature article. The initial version of an article is often the mandatory information and a few paragraphs sufficient to get across the basic story. It then goes through a number of revisions that often happen while the article is draft. Often but not always.
Once the article gets published online though it triggers a new wave of edits as language gets cleaned up and readers, editors and lawyers all descend on it. Editors now have a lot more tools to see what the reaction of the audience to a piece of content is and see how it is playing in social media. You also have articles picked up externally and that means making sure the article works as a landing page.
Naturally stories often develop their own momentum that requires you to switch from a single piece to a set of stories that are approaching different aspects of the overall reporting. You then need to link the different pieces of content together to form a logic package of content.
One thing that is interesting is looking at how many articles are changed after seven days. It is a surprising number as new stories often create a need to create a historic context and often historical stories look dusty in the light of breaking events. We have also had strange things happen with social news where aggregating sites pick up some story that was overlooked at the time.
All of this means that you cannot naively treat content as static but in fact means that you have an interesting decaching problem as it is true that content doesn’t change much, until it does start changing and then it needs to reflect the changes reasonably rapidly if you want to be picked up by things like Google.