Programming, Work

Optimizely testing like a hacker

At work we use Optimizely and I am a fan of the product; I think it has had a massive impact on the way we work and should really help guide us to decide what we choose to do.

However I am not a product manager, user testing expert or statistician (that last part is a lie, I’m a statistician who hasn’t done any stats for seventeen years) I am a dirty hacker programmer and I use Optimizely in a way that probably makes my colleagues weep but which I think actually makes it more valuable as a product. I want to talk about breaking some of the common rules that people put up around this testing.

Note that you need to understand what you’re doing here, I am not recommending this if you are new to the product or multi-variate testing. You also need a good stream of traffic to work on. I do, this is working out for me. One piece of good practice I would keep is: decide how you are going to judge the test before you start it and don’t change your measure once you’ve started. If it is clear your initial metrics aren’t helpful, design a new test. The knowledge you’ve gained is valuable for formulating the right measures.

Don’t change the test once you’ve started it

Only once the test has started can you understand what the problem you are dealing with is and what responses you can take to the issues. If you have a question about what is happening in the test feel free to create a new variation (always with a good name!) and throw it into the mix. I sometimes start with one variation and end the test with nine. It’s better to test immediately than speculate.

Changing a variation (no matter how tempting) is dangerous though as you’ll have to remember the differences and when you applied them. I prefer to spawn variations to changing an in-flight variation. Of course fixing bugs and unintentional consequences is fine. You’re looking at the long term rate not the initial performance.

Don’t change the traffic

I’m not sure this is a general shibboleth but I play around with traffic massively during the test. The great thing about Optimizely is that it takes care of the math so feel free to mix the allocation of traffic freely. If you have a run-away winner early on then don’t be afraid to feed the majority of traffic to it.

Make the test work for the whole audience

I don’t believe in this, make the test work for the easiest audience segment to access. I frequently only test on modern browsers. If you find a trend then shock, horror it often works for the whole audience. It’s about fast feedback not universal truth.

The biggest advantage is that you can use CORS-compliant browsers to do bigger changes to the pages under test.

Don’t change the underlying content

If you take your best performing variation and apply it to the page then the “Original” variation should trend to the variation. If it doesn’t then you know something is up with your measuring. I actually think it is really helpful to make a succession of changes to the base content, based on the tests until the Original variation is performing better than the individual variations.

Once Original is top performing variation you can stop testing the page.

A/A testing has problems

So what? Optimizely has a few issues, you need to deal in big numbers. A/A can be helpful but if you are working in five digit numbers or double-digit percentages then don’t worry about the noise.

Tests have to look good

If your theory is accurate it absolutely does not have to look good. If you are worried that your hypothesis is not working because of the visuals: get over yourself and admit that the idea was weak and you need to rethink it.

I like to start off all variations looking a bit crappy and then seeing whether they can be outperformed by an improved appearance. Often the answer is no; there is a rule of diminishing returns on the appearance of a variation. Things get over-designed on the web all the time. However by trying better looking variations in increments you know exactly how much effort to invest.

Standard
Web Applications, Work

Guardian May 2013 Hackday

You can see the reportage in these two liveblogs: Day 1 and Day 2 (note the terrible naming conventions). The theme of the hackday was “growth”. For the most part I took the theme to mean growth hacking and I did a lot of work along those lines which is difficult to talk publicly about.

However my prior lunchtime hacks had revealed to me that one of the fundamental problems the Guardian has is the volume of content it produces. This is not inherently a bad thing but the key thing to understand is that there is vastly more content than can fit onto what are called “fronts” in the jargon. A front is something like the front page of the site or the Environment section. These fronts produce a lot of traffic to content and for regular readers they are the essential navigation tool for the Guardian’s content.

Therefore I was interested in how we consider the dimension of time and perhaps use it to our advantage to help present content. This aspect of my hackday work is more open because actually I need a lot of help to understand to and because I’ve made some effort to try and use the public Content API rather than our internal content.

I called this work the “Time Trilogy” because it consists of three web apps that each use time as a way of accessing Guardian content.

The three apps are Guardian Word Count which was the original and gives you a sense of the challenge of navigating the content. It is also pretty fun to watch during the day and see the words tick up. So the Word Count spawned TickTickTick and Guardian In Review. TickTickTick is really a daily content explorer and was the first tool I needed to start sorting and exploring the breakdown of what we produce. It is a tool at its heart for exploring the daily news cycle. In Review is slightly different, it takes the one hundred most popular pieces of content over the last seven days and renders it. Initially I wanted it to be a kind of automatically generated magazine but actually looking at what people liked meant that I couldn’t make my initial idea work. People really like videos of meteors and Russian car crashes. What it is now is a way to explore material in the medium term, for content that perhaps has left the news cycle but is still relevant.

Neither app is really finished and the way I work is that I am very reliant on having working software to understand what I am doing and what is wrong or right about my approach. TickTickTick is much closer to being a complete product than In Review and it is providing more insight into the nature of the content being produced. For example there is a massive cluster of material between three and five minutes long.

I am going to continue to work on the apps because they help give me feedback into my work and ultimately these prototypes and toys tend to graduate into working components or theory on the main site itself. I may blog a bit more about them individually as I move them closer to something that genuinely creates value. I’m curious about feedback but acting on it is limited by my aims for the apps and realistically the time I have available.

I also wanted to talk a little bit about how I was working this hack day because I decided to reject advice and work solo rather than part of a team (although I did a little bit of backseat driving on the online magazines product and I did come up with the idea that actually won the hackday (and will hopefully be implemented and awesome)). Working alone does mean that your creations are going to be quite rough but it helps cover a lot of ground, I ended up doing five hacks and working on a total of seven. Working with other people means communicating well whereas solo you just need to express what you want very quickly.

My preferred tool for these kinds of hacks is Python on App Engine, which is what I use for my lunchtime hacks and for which I have a standard application template. With each new application that I do I can start to move the common patterns into the template. To avoid having to faff around with testing I use a loosely functional paradigm that I’ve carried over from Wazoku. It generally works quite well but there are a lot of rules to doing it.

This time around I was doing a bit more frontend work than my day job requires because I was working solo. Again having the startup experience was useful because I was more rediscovering a skillset than learning it. Hacks also means selecting your platform and choosing for optimal output.

For that reason I only targeted Firefox and Chrome (Firefox was actually easier to develop for in terms of standards) and I made liberal use of client-side Less and Coffeescript. I was impressed with how good the error-handling was in both. An obscure bug can wipe out all the productivity gains of a higher-order language but both worked great for me.

On top of that I tried experimenting with the new departmental standard of SMACSS (or at least my cherry-picking of it) and I made a lot of use of both Knockout and Bacon.js.

When I say I made use of SMACSS essentially what I did was namespace my classes to produce simple selectors. This did get me out of a problem I had in In Review so while it is truly the ugliest CSS standard and I suspect in time we may come to hate its rejection of rich functionality I concede that it is effective. Expect to see some of it applied to the main website sometime soon.

Knockout isn’t that popular in the department due to performance issues at a particular level of complexity but for me it did a brilliant job of simply syncing the visual DOM to the data feeds. I was really happy with it, other people were using AngularJS for more dynamic applications but they also had a lot more code than I did and again working solo less is so much more.

Bacon.js was really interesting. A lot of my approach to Javascript is functional and event-based but so far the events have been manually worked via jQuery. Bacon made it easier to create event sources with generic handlers and I probably didn’t use 10% of its full features. I’m curious to see what the rest of the department thinks of it but for my hacks it has definitely earned a place.

It was nice to do something outside the run of normal work and one thing that is quite cool about the hackday is that you can use it to tackle a technology that is entirely new to you and not have to worry about whether you succeed or fail.

Next time (May I believe) I think I want to learn about browser plugins as this is a way of producing better functionality for the Guardian without the hassle of having to make it work for the general population of browsers. Some people’s hacks this time around could have been released to the app/plugin stores and we could have been getting valuable user feedback by now.

Standard