Programming, Work

Optimizely testing like a hacker

At work we use Optimizely and I am a fan of the product; I think it has had a massive impact on the way we work and should really help guide us to decide what we choose to do.

However I am not a product manager, user testing expert or statistician (that last part is a lie, I’m a statistician who hasn’t done any stats for seventeen years) I am a dirty hacker programmer and I use Optimizely in a way that probably makes my colleagues weep but which I think actually makes it more valuable as a product. I want to talk about breaking some of the common rules that people put up around this testing.

Note that you need to understand what you’re doing here, I am not recommending this if you are new to the product or multi-variate testing. You also need a good stream of traffic to work on. I do, this is working out for me. One piece of good practice I would keep is: decide how you are going to judge the test before you start it and don’t change your measure once you’ve started. If it is clear your initial metrics aren’t helpful, design a new test. The knowledge you’ve gained is valuable for formulating the right measures.

Don’t change the test once you’ve started it

Only once the test has started can you understand what the problem you are dealing with is and what responses you can take to the issues. If you have a question about what is happening in the test feel free to create a new variation (always with a good name!) and throw it into the mix. I sometimes start with one variation and end the test with nine. It’s better to test immediately than speculate.

Changing a variation (no matter how tempting) is dangerous though as you’ll have to remember the differences and when you applied them. I prefer to spawn variations to changing an in-flight variation. Of course fixing bugs and unintentional consequences is fine. You’re looking at the long term rate not the initial performance.

Don’t change the traffic

I’m not sure this is a general shibboleth but I play around with traffic massively during the test. The great thing about Optimizely is that it takes care of the math so feel free to mix the allocation of traffic freely. If you have a run-away winner early on then don’t be afraid to feed the majority of traffic to it.

Make the test work for the whole audience

I don’t believe in this, make the test work for the easiest audience segment to access. I frequently only test on modern browsers. If you find a trend then shock, horror it often works for the whole audience. It’s about fast feedback not universal truth.

The biggest advantage is that you can use CORS-compliant browsers to do bigger changes to the pages under test.

Don’t change the underlying content

If you take your best performing variation and apply it to the page then the “Original” variation should trend to the variation. If it doesn’t then you know something is up with your measuring. I actually think it is really helpful to make a succession of changes to the base content, based on the tests until the Original variation is performing better than the individual variations.

Once Original is top performing variation you can stop testing the page.

A/A testing has problems

So what? Optimizely has a few issues, you need to deal in big numbers. A/A can be helpful but if you are working in five digit numbers or double-digit percentages then don’t worry about the noise.

Tests have to look good

If your theory is accurate it absolutely does not have to look good. If you are worried that your hypothesis is not working because of the visuals: get over yourself and admit that the idea was weak and you need to rethink it.

I like to start off all variations looking a bit crappy and then seeing whether they can be outperformed by an improved appearance. Often the answer is no; there is a rule of diminishing returns on the appearance of a variation. Things get over-designed on the web all the time. However by trying better looking variations in increments you know exactly how much effort to invest.

Standard
Web Applications, Work

Guardian May 2013 Hackday

You can see the reportage in these two liveblogs: Day 1 and Day 2 (note the terrible naming conventions). The theme of the hackday was “growth”. For the most part I took the theme to mean growth hacking and I did a lot of work along those lines which is difficult to talk publicly about.

However my prior lunchtime hacks had revealed to me that one of the fundamental problems the Guardian has is the volume of content it produces. This is not inherently a bad thing but the key thing to understand is that there is vastly more content than can fit onto what are called “fronts” in the jargon. A front is something like the front page of the site or the Environment section. These fronts produce a lot of traffic to content and for regular readers they are the essential navigation tool for the Guardian’s content.

Therefore I was interested in how we consider the dimension of time and perhaps use it to our advantage to help present content. This aspect of my hackday work is more open because actually I need a lot of help to understand to and because I’ve made some effort to try and use the public Content API rather than our internal content.

I called this work the “Time Trilogy” because it consists of three web apps that each use time as a way of accessing Guardian content.

The three apps are Guardian Word Count which was the original and gives you a sense of the challenge of navigating the content. It is also pretty fun to watch during the day and see the words tick up. So the Word Count spawned TickTickTick and Guardian In Review. TickTickTick is really a daily content explorer and was the first tool I needed to start sorting and exploring the breakdown of what we produce. It is a tool at its heart for exploring the daily news cycle. In Review is slightly different, it takes the one hundred most popular pieces of content over the last seven days and renders it. Initially I wanted it to be a kind of automatically generated magazine but actually looking at what people liked meant that I couldn’t make my initial idea work. People really like videos of meteors and Russian car crashes. What it is now is a way to explore material in the medium term, for content that perhaps has left the news cycle but is still relevant.

Neither app is really finished and the way I work is that I am very reliant on having working software to understand what I am doing and what is wrong or right about my approach. TickTickTick is much closer to being a complete product than In Review and it is providing more insight into the nature of the content being produced. For example there is a massive cluster of material between three and five minutes long.

I am going to continue to work on the apps because they help give me feedback into my work and ultimately these prototypes and toys tend to graduate into working components or theory on the main site itself. I may blog a bit more about them individually as I move them closer to something that genuinely creates value. I’m curious about feedback but acting on it is limited by my aims for the apps and realistically the time I have available.

I also wanted to talk a little bit about how I was working this hack day because I decided to reject advice and work solo rather than part of a team (although I did a little bit of backseat driving on the online magazines product and I did come up with the idea that actually won the hackday (and will hopefully be implemented and awesome)). Working alone does mean that your creations are going to be quite rough but it helps cover a lot of ground, I ended up doing five hacks and working on a total of seven. Working with other people means communicating well whereas solo you just need to express what you want very quickly.

My preferred tool for these kinds of hacks is Python on App Engine, which is what I use for my lunchtime hacks and for which I have a standard application template. With each new application that I do I can start to move the common patterns into the template. To avoid having to faff around with testing I use a loosely functional paradigm that I’ve carried over from Wazoku. It generally works quite well but there are a lot of rules to doing it.

This time around I was doing a bit more frontend work than my day job requires because I was working solo. Again having the startup experience was useful because I was more rediscovering a skillset than learning it. Hacks also means selecting your platform and choosing for optimal output.

For that reason I only targeted Firefox and Chrome (Firefox was actually easier to develop for in terms of standards) and I made liberal use of client-side Less and Coffeescript. I was impressed with how good the error-handling was in both. An obscure bug can wipe out all the productivity gains of a higher-order language but both worked great for me.

On top of that I tried experimenting with the new departmental standard of SMACSS (or at least my cherry-picking of it) and I made a lot of use of both Knockout and Bacon.js.

When I say I made use of SMACSS essentially what I did was namespace my classes to produce simple selectors. This did get me out of a problem I had in In Review so while it is truly the ugliest CSS standard and I suspect in time we may come to hate its rejection of rich functionality I concede that it is effective. Expect to see some of it applied to the main website sometime soon.

Knockout isn’t that popular in the department due to performance issues at a particular level of complexity but for me it did a brilliant job of simply syncing the visual DOM to the data feeds. I was really happy with it, other people were using AngularJS for more dynamic applications but they also had a lot more code than I did and again working solo less is so much more.

Bacon.js was really interesting. A lot of my approach to Javascript is functional and event-based but so far the events have been manually worked via jQuery. Bacon made it easier to create event sources with generic handlers and I probably didn’t use 10% of its full features. I’m curious to see what the rest of the department thinks of it but for my hacks it has definitely earned a place.

It was nice to do something outside the run of normal work and one thing that is quite cool about the hackday is that you can use it to tackle a technology that is entirely new to you and not have to worry about whether you succeed or fail.

Next time (May I believe) I think I want to learn about browser plugins as this is a way of producing better functionality for the Guardian without the hassle of having to make it work for the general population of browsers. Some people’s hacks this time around could have been released to the app/plugin stores and we could have been getting valuable user feedback by now.

Standard
Software, Work

Generating corporate welfare through enterprise software

It is always good to have someone on the inside and therefore service software companies often go to great lengths to woo potential champions within large organisations. That’s the way things are but there is an interesting phenomena that takes this too far and I call it “corporate welfare”.

Companies often like to tote how configurable and adaptable their software is. By using just a few web screens or maybe a set of configuration files you can make the software do whatever you want. How convenient! Or rather how convenient for the suppliers. How many of you have ever had a burning desire to tinker with your email system setup, or your bug tracker’s workflow or the permissions of your project management software.

Probably no-one except the product champion who argued for the software to be introduced in the first place. In fact the champion’s role in the company is now predicated on their expertise with the existing solution. What incentive do they have to replace or review “their” section of infrastructure? Their salary is now based on how effective their relationship is with their supplier.

In fact I don’t think it is uncommon for people changes to precede changes in software providers. Someone has to take over the champion’s job of massaging the product and without the massive personal commitment to it finds the job cumbersome and undesirable, sparking the search for solutions.

My argument would be that if you cannot primarily use a solution out of the box then you are better off not using it. If you have a business process that requires a lot of gnarly configuration and bespoke software work then the greater value is in simplifying the business process rather than recreating in software.

In my view complex or whitebox products are more about capturing customers than serving them and that goes from SAP down to JIRA.

Standard
Web Applications, Work

The myth of “published” content

Working at the Guardian you often end up having conversations with people about the challenges you face in scaling to meet the often spiky traffic you get in online media. One thing that comes up again and again is the idea that content, once published is essentially static. Now there is a lot to be said for this as digital journalism sticks pretty close to a lot of the conventions of print media; copy is often culled from the print version and follows the 24 hour media cycle quite strongly.

However what is often surprising is the amount of edits a piece of content receives, particularly if it is not a print feature article. The initial version of an article is often the mandatory information and a few paragraphs sufficient to get across the basic story. It then goes through a number of revisions that often happen while the article is draft. Often but not always.

Once the article gets published online though it triggers a new wave of edits as language gets cleaned up and readers, editors and lawyers all descend on it. Editors now have a lot more tools to see what the reaction of the audience to a piece of content is and see how it is playing in social media. You also have articles picked up externally and that means making sure the article works as a landing page.

Naturally stories often develop their own momentum that requires you to switch from a single piece to a set of stories that are approaching different aspects of the overall reporting. You then need to link the different pieces of content together to form a logic package of content.

One thing that is interesting is looking at how many articles are changed after seven days. It is a surprising number as new stories often create a need to create a historic context and often historical stories look dusty in the light of breaking events. We have also had strange things happen with social news where aggregating sites pick up some story that was overlooked at the time.

All of this means that you cannot naively treat content as static but in fact means that you have an interesting decaching problem as it is true that content doesn’t change much, until it does start changing and then it needs to reflect the changes reasonably rapidly if you want to be picked up by things like Google.

 

Standard
Work

The BBC “across”

The term “across” is absolutely endemic at the BBC and because so many people in UK media pass through the BBC it also crops up across the sector generally. Although I was initially scornful of it as a term I have long since caved to the inevitable and use it as well.

Being “across” seems to have originated in the fact that the BBC have multiple media streams and when a journalist talks about being “across” things they might well mean that they are producing pieces on a topic across multiple media, say television and radio. It might also mean that they are tracking a breaking story and are watching other media outlets for what they are saying about a story.

Outside this context though the word more or less means “understanding”. So when someone from the BBC is “across” something it means they understand it, sometimes if they are “across” it enough they can also make decisions about it. When someone isn’t “across” something then they do not feel they understand it or they are unprepared to answer questions about it.

Ironically this meaning then seems to seep back into broadcast journalism and I have heard journalists on air saying that they are “across” developments such as the formation of the coalition.

At this point “across” feels kind of ubiquitous apart for people who have been raised in single stream media so I think it’s worth you being “across” it too.

Standard
Work

Success looks like success

One useful thing I have learnt by working in an early stage startup is that success looks and feels like success. If you are doing something and it does not seem like it is being successful then it isn’t. This might seem like something that is completely obvious but it is really not.

There are many kinds of “not success” (or failure to give it it’s true but cruel name). The worst is the “almost success” where something works and delivers on its promise, but not very well. Almost success creates this dilemma where perhaps with a little iterating and more effort you could turn it into a real success.

When this kind of success creates ongoing liabilities in terms of customer expectations then the situation is even worse. If you try and cancel things or do a lean startup pivot then you are guaranteed to alienate your current customers with only the hope of gaining more of the true customer base you originally envisaged.

Weak success is a little better, weak success is not outright failure but is so unsuccessful that people completely understand if you want to knock it on the head.

In a large organisation though things are far more difficult. Both almost success and weak success are ironically more dangerous in a big and profitable organisation due to two factors: personal reward and hidden cross-subsidy.

If someone achieves any degree of success in an organisation they expect to be rewarded for it. Perhaps justifiably, perhaps not. Either way there are serious morale implications if at the end of period of exertion by any group or individual you kill off the object of all their efforts, no matter how rational and correct that decision may be.

In general people are conflict averse so they prefer to reward almost success and move on to other activities that might be more successful.

However a set of almost successful activities all have real costs and the minute any of them fail to generate revenue sufficient to carry their costs then you are in the world of the hidden cross-subsidy where all your almost successful projects and products start to drag down any aspect of the business that is profitable.

It is a tar pit that can be difficult to escape unless you have really good accounting to see where money is coming and going. It is only easy in a startup because generally you only have one product and therefore all profit and loss is easy to tally and attribute.

So everything in your business that is not a success is a potentially business-killing failure. And for that reason, even though it is hard, you need to end almost success just as much as you need end failure.

The question to ask is not “Is this successful?”; if you are asking that question then the answer is simply no. If you are successful then the question you ask is “How are we going to deal with all this success?”.

Standard
Programming, Software, Web Applications, Work

Names are like genders

One thing I slightly regret in the data modelling that is done for users in Wazoku is that I bowed to marketing pressure and “conventional wisdom” and created a pair of first and last name fields.¬†If gender is a text field then how much more so is the unique indicator of identity that is a name?

The primary driver for the split was so that email communications could start “Hey Joe” rather than “Hey Joe Porridge Oats McGyvarri-Billy-Spaulding”. Interestingly as it turns out this is definitely the minority usage case and 95% of the time we actually put our fields back together to form a single string because we are displaying the name to someone other than the user. It would have been much easier to have a single name field and then extract the first “word” from the string for the rare case that we want to try and informally greet the user.

My more general lesson is that wherever I (or we more generally as a business) have tried to pre-empt the structure of a data entity we have generally gotten it wrong, however so far we have not had to turn a free text field into a stricter structure.

Standard
Work

Google Apps and App Engine

If you use Google Apps to provide you with email then you should also really be thinking about enabling and using Google App Engine as well. Internal applications are much easier to deliver to the business as a whole and having a ready-made platform makes it easier to try out ideas that previously would have been impractical.

The first advantage is that Google Apps that are bound into your domain allow you to create something that is easy to access for an existing user (no additional login is required) but also gives you peace of mind that you are exposing virtually zero surface area for attack.

The second is that for Python at least it is easy to access a very full featured environment with a minimum of code. Want to send emails, have task queues, access to memcache, serve static content? It is all a YAML configuration line or import away.

I love services like Heroku but a lot of internal apps have relatively light usage and benefit from the batteries included approach rather than combining various plugins. It makes it easy to switch between different approaches and react to different demands.

Standard
Work

Is this really the manifesto we want?

Silicon Milkroundabout tried to produce a manifesto for why people should consider working at a startup. This is the outcome.

The first time I saw it I was very disappointed. While I cannot knock its authenticity it is a profoundly depressing document. While there are the standard statements about passion and having the freedom to make what you should rather than what you are told to; there is much more about poverty, tiredness and scarcity.

If I read this I would say that working for startups is a mugs game. You’re far better coming in during the expansion phase when salaries are higher and the business case better proven.

The many references to tiredness and lack of sleep is also revealing. What I have discovered is that tremendous pressure is put on you to deliver product in a technology startup and this should be resisted at all costs. Sustainable pace is more important in small organisations than in large ones. In a large organisation you can actually burn out a team to achieve a goal because you probably have access to the resources to replace them. In a small one, once you’ve wrecked a team (probably including yourself) you have no way of replacing them and a death spiral will inevitably set in as decision making becomes progressively worse. Remember that a startup should aim to deliver progress not product. Don’t work with people who don’t understand this.

Money, frankly seems to be the missing ingredient from this list of reasons. Maybe adding “because late-stage equity options are worthless” would ruin the overall tone. Many people, especially investors, are involved in startups because they offer potential massive returns in a low growth environment. Americans are much more open and brash (you might even say vulgar) about this with talk of flipping and sale valuations of millions of dollars (often farcical as in the case of Groupon who merely had the bad luck to be caught before their IPO).

Even then this reason is foolish because if what you want is money then you should go to the City. The money is guaranteed, guaranteed in fact by the government which not only underwrites it, bails it out but then charges off into Europe to protect it from legislation that might affect its lucrative tax haven and money “recycling” business. In contrast being involved in “entrepreneurship” is a rather romantic and significantly more challenging way to achieve wealth.

I do work at a startup though and I was at Silicon Milkroundabout trying to encourage people to join me in doing this.

My personal motivation is that for me a startup is a business that is complete but small enough that you can actually see and understand all parts of it. The interesting thing is that organisational dysfunction is actually just a likely in a startup as a larger firm. Often the problems are actually exactly the same, simply orders of magnitude less significant.

Being able to pull the curtain aside is fascinating. Working in the small also removes the mystique that gathers around things and people that generate large revenues. Once a certain number of livelihood’s become involved in a particular process or product you lose the ability to tinker with things or even to question why things are the way they are. In an environment with no money and no customers any change is either positive or at least neutral.

Working in a startup for non-cynical reasons means creating something that is of profound personal interest. I really am interested in trying to remove friction from the process of turning ideas into reality. Wazoku is a product that I do believe in and what I was saying to a lot of people at Silicon Milkroundabout was try the product. If you are interested in solving the problem and the solution in turn solves some of your problems your work is satisfying at all levels. If there is not a satisfactory solution already in progress for your problem then a startup is the only way that you can initiate that process of moving to a more perfect world.

Working for a startup is a last resort; need should be part of your motivation; ignore idiots and their advice; make sure you get enough sleep.

Standard
Work

Silicon Milkroundabout Roundup

Interesting time at Silicon Milkroundabout this Sunday. There were kind of three levels of activity going on, first of all there was the element of developer goofing off with arcade machines and free stuff. Then there was the opportunity to network, first of all between the startups and secondly between the developers (although I am not sure how much mixing between different dev teams was actually going on).

Finally there was the recruitment activity. Unlike the first event this really was more of a milkround with a younger, less experienced audience. The format did seem to be pitching for talent which is interesting as I am not convinced that people are going to find the best role by going with the best sales pitch. There has to be a better way of understanding the culture of the firm you are potentially joining.

The different streams of activity make the event quite weird in its nature and purposes. It feels like there is a need for a kind of startup expo to allow startups to see and meet one another without the pretext of seeking to employ people. There is also a need for a kind of elite coder event on a quarterly basis that is maybe a little select, a bit like a mini-conference, that allows for networking and swapping of intelligence and gossip on what is really going on at various firms.

Standard