Work

Learning to love the Capability Maturity Model

I had a job where the management were enamoured of the Capability Maturity Model (CMM) and all future planning had to be mapped onto the stages of the maturity model. I didn’t enjoy the exercise very much because in addition to five documented stages there was generally a sixth which was stagnation and decay as the “continually improving” part of the Optimising stage was generally forgotten about in my experience.

Instead budgets for ongoing maintenance and iteration were cut to the bone so that the greatest amount of money could be extracted from the customers paying for the product.

Some government departments I have had dealings with had a similar approach where they would budget capital investment for the initial development of software or services and then allocate nothing for the upkeep of them except fixed costs such as on-premise hosting for 20 years (because why would you want to do anything other than run your own racks?).

This meant that five years into this allegedly ongoing-cost-free paradise services were breaking down, no budget was available to address security problems, none of the original development team were available to discuss the issues with the services and the bit rot of the codebase was making a rewrite the only feasible response to the problem which undercut the entire budgetary argument for amortisation.

A helpful model misapplied

So generally I’ve not had a good experience with people who use the model. And that’s a shame because recently I’ve been appreciating it more and more. If you bring an Agile mindset to the application of CMM: seeing it as a way of describing the lifecycle of a digital product within a wider concept of cyclical renewal and growing understanding of your problem space then it is a very powerful tool.

In particular some product delivery practices have an assumption on the underlying state of maturity in the business process. Lets take one of the classics: the product owner or subject matter expert. Both Scrum and Domain Driven Design make the assumption that there is someone who understands how the business is meant to work and can explain it clearly in a way that can be modelled or turned into clear requirements.

However this can only be true at Level 2 (Repeatable) at the earliest and generally the assumption of a lot of Agile delivery methods is that the business is at Level 4 (Managed). Any time a method asks for clear requirements or the ability to quantify the value returned through metrics you are in the later stages of the maturity model.

Lean Startup is one of the few that actually addresses the problems and uncertainty of a Level 1 (Initial) business. It focuses on learning and trying to lay down foundations that are demonstrated to be consistent and repeatable. In the past I’ve heard a lot of argument about the failings of the Minimum Viable Product and the need for Minimum Loveable, Marketable or some more developed concept Product. Often people who make these arguments seem confused about where they are in terms of business maturity.

The Loveable Product often tries to jump to Level 3 (Defined), enshrining a particular view of the business or process based on the initial results. Sometimes this works but it as just a likely to get you to a dangerous cul de sac where the product is too tailored to a small initial audience and needs to be reworked if it is meet the needs of the larger potential target audience.

John Cutler talks about making bets in product strategy and this seems a much more accurate way to describe product delivery in the early maturity levels. Committing more effort without validation is a bigger bet, often in an early stage business you can’t do that much validation, therefore if you want to manage risk it has to be through the commitment you’re making.

Go to market phases are tough partly because they explicitly exist in these low levels of capability maturity, often you as an organisation and your customers are in the process of trying to put together a way of working with few historic touchpoints to reference. Its natural that this situation is going to be a bit chaotic and ad-hoc. That’s why techniques that focus on generating understanding and learning are so valuable at this stage.

The rewards of maturity

Even techniques like Key Performance Indicators are highly dependent on the underlying maturity. When people talk about the need to instrument a business process they often have an unspoken assumption that there is one that just needs to be translated into a digital product strategy of some kind. That assumption can often be badly wrong and it turns out the first task is actually traditional business analysis to standardise what should be happening and only then instrumenting it.

In small businesses in particular there is often no process than the mental models of a few key staff members. The key task is to try and surface that mental model (which might be very successful and profitable, don’t think immature means not valuable) into external artefacts that are robust enough to go through continuous improvement processes.

A lot of businesses jump into Objective Key Results and as an alignment tool that can be really powerful but when it comes to Key Results if you are not at that Level 4 (Managed) space then the Key Results often seem to boil down to activities completed rather than outcomes. In fairness at Level 5 (Optimising) the two can often be the same, Intel’s original OKRs seem very prescriptive compared to what I’ve encountered in most businesses but they had a level of insight into what was required to deliver their product that most businesses don’t.

If you do get to that Level 5 (Optimising) space then you can start to apply a lot of buzzy processes with great results. You can genuinely be data-driven, you can do multi-variant testing, you can apply RICE, you can drive KPIs with confidence that small gains are sustainable and real.

Before you’re there though you need to look at how to split your efforts between maturing process, enabling consistency and not just doing digital product delivery.

Things that work across maturity stages

Some basic techniques like continual improvement (particularly expressed through methods like total quality), basic business intelligence that quantifies what is happening without necessarily being able to analyse or compare it and creating focus work at every stage of maturity.

However until you get to Level 2 (Repeatable) then the value of most techniques based on value return or performance improvement are going to be almost impossible to assess. To some extent the value of a digital product in Level 1 (Initial) is to offer a formal definition of a process and subject it to analysis and revision. Expressing a process in code and seeing what doesn’t work in the real world is a modelling exercise in itself (but sadly a potentially expensive one).

Learning to love the model

The CMM is a valuable way of understanding a business and used as a tool for understanding rather than cost-saving it can help you understand whether certain agile techniques are going to work or not. It also helps understand when you should be relying more on your understanding and expertise rather than data.

But please see it as a circle rather than a purely linear progression. As soon as your technology or business context changes you may be experiencing a disruptive change that might mean rethinking your processes rather than patching and adapting your current ones. Make sure to reassess your maturity against your actual outputs.

And please always challenge people who argue that product or process maturity is an excuse to strip away the capacity to continually optimise because that simply isn’t a valid implementation of the model.

Standard
Programming

Enterprise programming 2023 edition

Back in the Naughties there were Enterprise Java Beans, Java Server Pages, Enterprise Edition clustered servers, Oracle databases and shortly thereafter the Spring framework with its dependency injection wiring. It was all complicated, expensive and to be honest not much fun to work with. One of the appeals of Ruby on Rails was that you could just get on an start writing a web application rather than staring at initialisation messages.

During this period I feel there was a big gap between the code that you wrote for living and the code you wrote for fun. Even if you were writing on the JVM you might be fooling around with Jython or Groovy rather than a full Enterprise Java Bean. After this period, and in particular post-Spring in everything, I feel the gap between hobby and work languages collapsed. Python, Ruby, Scala, Clojure, all of these languages were fun and were equally applicable to work and small-scale home projects. Then with Node gaining traction in the server space then gap between the two worlds collapsed pretty dramatically. There was a spectrum that started with an inline script in a HTML page that ran through to a server-side API framework with pretty good performance characteristics.

Recently though I feel the pendulum has been swinging back towards a more enterprisey setup that doesn’t have a lot of appeal for small project work. It often feels that a software delivery team can’t even begin to create a web application without deploying on a Kubernates cluster with Go microservices being orchestrated with a self-signing certificate system and a log shipping system with Prometheus and Grafana on top.

On the frontend we need an automatically finger-printing statically deployed React single-page app, ideally with some kind of complex state management system like sagas or maybe everything written using time-travelable reactive streams.

Of course on top of that we’ll need a design system with every component described in Storybook and using a modular class-based CSS system like Tailwind or otherwise a heavyweight styled component library based on Material design. Bonus points for adding React Native into this and a CI/CD system that is ideally mixes a task server with a small but passionate community with a home-grown pipeline system. We should also probably use a generic build tool like Bazel.

And naturally our laptop of choice will be Apple’s OSX with a dependency on XCode and Homebrew. We may use Github but we’ll probably use a monorepo along with a tool to make it workable like Lerna.

All of this isn’t much fun to work on unless you’re being paid for it and it is a lot of effort that only really pays off if you hit the growth jackpot. Most of the time this massive investment in complex development procedures and tooling simply throws grit into the gears of producing software.

I hope that soon the wheel turns again and a new generation of simplicity is discovered and adopted and that working on software commercially can be fun again.

Standard
Work

October 2023 month notes

I’ve been learning more about Postgres as I have been moving things from Dataset to Psycopg3. It is kind of ridiculous the kind of things you can do with it when strip away the homogenising translation layer of things like ORMs. Return a set of columns from your update? No problem. Upsert? Straight-forward.

However after completing a CONFLICT clause I received a message that no conflict was possible on the columns I was checking and I discovered that I had failed to add a Primary Key to the table when I created it. It probably didn’t matter to the performance of the table as it was a link table with indexes on each lookup column but I loved the way that the query parsing was able to do that level of checking on my structure.

Interestingly I had a conflict clause in my previous ORM statement I was replacing and it had never had an issue so presumably it was doing an update then insert pattern in a transaction rather than using native features. For me this shows how native solutions are often better than emulation.

Most of the apps I’ve converted to direct use of queries are feeling more responsive now (including the one I use to draft these posts) but I’m not 100% certain whether this is because of switch to lower-level SQL or because I’ve been fixing the problems in the underlying relational model that were previously being hidden from me.

We’re going to need a faster skateboard

I have been thinking a lot about the Gold-plated Donkey Cart this month. When you challenge problems with solutions you often first have a struggle to get people to admit that there is a problem and even if it is admitted then often the first response is to try and patch or amend the existing solution than consider whether the right response might be.

We have additive minds so this tendency to patch what is existing is natural but sometimes people aggressively defend the status quo, even when it is counter-productive to their overall success.

Weakly typed

I’ve had some interesting experiences with Typescript this month, most notably an issue with a duplicated package which resulted in code that has been running in production for months but which has either not been correctly typed or has been behind the intended version by maybe four major versions. Typescript is interesting amongst type-hinted languages in that it has typing files that are often supplied separately from the code itself and in some cases which exist independently of the code itself. My previous experience of Python typing for example stopped the checker at the boundaries of third-parties and therefore only applied to the code you are writing yourself.

I’m uncertain of the value of providing type files for Javascript libraries as the compile-time and runtime contexts seem totally different. I found a Javascript dependency that had a completely broken unit test file and on trying to correct it I found that it couldn’t have the behaviour that the tests were trying to verify. Again I wondered about how this code was working in production and predictably it turned out that the executed code path never included the incorrectly specified behaviour. Dynamic code can be very resilient and at the same time a time bomb waiting to happen no matter what your

I think Typescript code would be better off if it was clearer that any guarantees of correctness can only be provided for the code you have totally under your control and which is being compiled and checked by you.

Frozen in time

I’ve been thinking a lot as well about a line from this talk by Killian Valkhof where he mentions that our knowledge on how to do things often gets frozen based on how we initially learnt to do things. For developers who learnt React for frontend will be the future people who learnt to do frontend via jQuery. I’ve been looking at Web Components which I thought were pretty terrible when they first came out but now look delightfully free of complex build chains and component models.

But more fundamentally it has made me think about when I choose or reject things am I doing so based on their inherent qualities in the present moment or based on the moment in time when I first learnt and exercised those skills. For CSS for example I’m relatively old-fashioned and I have never been a fan of the CSS-in-JS idea. However I think this approach, while maybe being outside contemporary preferences, is sound. Sound CSS applies across any number of frontend component models and frameworks and the work that goes into the CSS standards is excellent where as (ironically) the limitations of Javascript frameworks to express CSS concepts means that often it is a frozen subset that is usable.

I’ve never been entirely comfortable with Docker or Kubernates though and generally prefer PaaS or “serverless” solutions. Is that because I enjoyed the Heroku developer experience and never really understood the advantages of containerisation as a result.

Technology is fashion and therefore discernment is a critical quality for developers. For most developers though it is not judgement that they manifest but a toxic self-belief in the truth of whatever milieu they entered into the industry in. As I slog through my third decade in the profession doubt is something that I feel strongly about my opinions and trying to frame my judgements in the evidence and reasoning available now seems a valuable technique.

Standard
Work

How I have been using knowledge graphs

Within a week of using Roam Research’s implementation of a knowledge graph or Zettlekasen I decided to sign up because there was something special in this way of organising information. My initial excitement was actually around cooking, the ability to organise recipes around multiple dimensions (a list of ingredients, the recipe author, the cuisine) meant you could both search and browse by the ingredients that you had or the kind of food you wanted to eat.

Since then I’ve started to rely on it more for organising information for work purposes. Again the ability to have multiple dimensions to things is helpful. If you want to keep some notes about a library for handling fine grained authorisation you might want to come back to that via the topic of authorisation, the implementation language or the authorisation model used.

But is this massively different from a wiki? Well a private wiki with a search function would probably do all this too. For me personally though I never did actually set up something similar despite experiments with things like Tiddlywiki. So I think there are some additional things that make the Zettelkasten actually work.

The two distinctive elements missing from the wiki setup are the outliner UI and the concept of daily notes. Of the two the daily notes is the simplest, by default these systems direct you a diary page by default, giving you a simple context for all your notes to exist in. The emphasis is getting things out of your head and into the system. If you want to cross-link or re-organise you can do so at your leisure and the automatic back-referencing (showing you other pages that reference the content on the page you are viewing) makes it easy to remind you of daily notes that maybe you haven’t consciously remembered you want to re-organise. This takes a good practice and delivers a UI that makes it simple. Roam also creates an infinite page of daily notes that allows you to scroll back without navigating explicitly to another page. Again nothing complicated but a supportive UI feature to simplify doing the right thing.

The outliner element is more interesting and a bit more nuanced. I already (and continue to use) an outliner in the form of Workflowy. More specifically, I find it helpful for outlining talks and presentations, keeping meeting notes and documenting one to ones (where the action functionality is really helpful to differentiate items that need to be actioned from notes of the discussion). The kind of things where you want to keep a light record with a bit of hierarchical structure and some light audit trail on the entries. I do search Workflowy for references but I tend to access it in a pretty linear way and rarely access it without a task-based intention.

Roam and Logseq work in exactly the same way, indeed many of the things I describe above are also use-cases for those products. If I wanted to I could probably consolidate all my Workflowy usage into Roam except for Roam’s terrible mobile web experience. However there is a slight difference and that is due to the linking and wiki-like functionality. This means you can have a more open discovery journey within the knowledge graph. Creating it and reading, I have found, are two different experiences. I think I add content in much the same way as an outliner but I don’t consume it the same way. I am often less task-orientated when reviewing my knowledge graph notes and as they have grown in size I have had some serendipitous connection making between notes, concepts and ideas.

What the outliner format does within the context of the knowledge graph is provide a light way of structuring content so that it doesn’t end up a massive wall of text in the way that a wiki page sometimes can. In fact it doesn’t really suit a plain narrative set of information that well and I use my own tool to manage that need and then link to the content in the knowledge graph if relevant.

In the past I have often found myself vaguely remembering something that a colleague mentioned, a link from a news aggregator site or a newsletter or a Github repo that seemed interesting. Rediscovering it can be very hard in Google if it is neither recent nor well-established, often I have ended up reviewing and searching my browser history in an almost archaeological attempt to find the relevant content. Dumping interesting things into the knowledge graph has made them more discoverable as individual items but also adds value to them as you gain the big picture understanding of how things fit together.

It is possible to do achieve any outcome through any misuse of a given set of tools but personal wikis, knowledge graphs and outliners all have strengths that are best when combined as much as possible into a single source of data and which have dedicated UIs for specific, thoughtful task flows over the top. At the moment there’s not one tool that does it all but the knowledge graph is the strongest data structure even if the current tools lack the UI to bring out the best from it.

Standard
Software, Work

The problem with developer job titles

Job titles are hard. This exchange on Twitter prompted a few thoughts that I couldn’t quite fit into a few smart-arsed twits.

In Chris’s Tweet he mentions that engineer is a co-opted title and that engineering is a discipline in its own right which most software groups don’t subscribe to because they aren’t really trying to do engineering. Not that this is a criticism but there is a massive difference between building a bridge or a road and creating a new web service. For a start there is a lot more established practice, science and understanding in physical engineering and more established and understood formal qualifications.

When I briefly worked in government helping create the Digital Careers framework people who were associated with Defence rightly objected to the confusion between software “engineering” and other engineers within professional frameworks. No-one is going to ask a developer to fix the combat damage on an airfield. I’ve previously joked that if we were honest we’d talk about “software overengineers” given that most developers struggle to find the simplest thing that works.

For the framework we settled on “developer” for people who wrote code and inconsistently used “engineering” for operational roles. I think on the basis that they created “infrastructure” where maybe the analogy makes a sort of sense.

I would also have gotten rid of “architect” if I’d had a chance; for exactly the same confusion but that term was too deeply embedded and still is a badge of prestige within the industry. Even now in the commercial world I have experienced hires wanting to be involved in “architecture” (and sadly not wanting to help me remodel my ground floor).

In Chris’s tweet he asks about what happened to the title “Programmer”. When I started in the industry this was indeed the coveted title and ideally I still think of myself this way even though it’s blatantly not true in the same way now.

However the issue with being a programmer is that jobs that literally involve just programming are few and far between. When I started in the industry the experienced developers were people who were at the tail-end of mainframe programming and a bit of what they were doing was still persuading machines to perform the tasks that were needed. The end was already in sight for pure programming jobs though. Some of my first professional programming work involved networking, a slightly dirty topic for the mainframe types.

Nowadays the emphasis is on understanding the domain space you are working on as well as the technical aspects of programming. I prefer the term “developer” (as others do) with the implication of being someone who develops systems of value via the medium of technology.

However that term also has its problems. When I worked at the Guardian I had a personal SEO battle with the Pune-based property development group for the search term “Guardian developers”. That battle seems to have been won now via sub-domain. This seems to be true more generally and now it is property developers who are having to use the prefix “property” on their job titles.

For a new profession not even past it’s first century, creating our professional lexicon is always going to be hard but in borrowing titles so shamelessly we are always creating problems for ourselves.

Programmer is probably the closest, truest name for what most of us do at the core of our role. For web developers though, assembling the typical bricolage of libraries and tooling is often an exercise in minimal programming and maximum duct taping. Perhaps it fairest to say that we are “software assemblers”, expect that might get confused with, you know, assemblers. Painful.

So in the end most of us are expected to bring capability in programming within teams that are creating technological systems of value. As long as programmers realise that programming is not the activity of value in itself then maybe we don’t need to worry so much about titles.

Standard
Work

Please stop showing me “the data”

This isn’t a reactionary rant against data-driven decision making and it isn’t about nostalgia for gut-driven benevolent dictators.

Instead it is an appeal for reason to play an equal part in decision making.

The seed of this post was planted by a keynote Ines Montani gave at EuroPython. At the time I was more interested in her central argument that paying customers are the most important metric a business can have.

But in part of the talk she talks about the cliche of “show me the data”, a phrase that I think originated at NASA where, in context, it makes a lot of sense but when transplanted to the world of small business quickly becomes expensive, slow and farcical.

In part of her talk Ines mentioned that when making decisions on how to run a small business there shouldn’t be a need to provide data for or against every decision. “Why can’t we use reason?” she asked.

The question had huge resonance for me. The emphasis on data-driven decisions in businesses has not led to improved data or statistical literacy. Instead it has led to the generation of fig-leaf numbers, impenetrable spreadsheets of data as obfuscation and irrelevant but voluminous data collection. I see little evidence that decision-making is better.

It has also exposed the idea that the problem is data collection. The more information we collect then the more it feels like the more any decision can be justified or any course of action advocated or vetoed. Interpretation, selection and analysis of the data is more important than ever, and this at its heart requires reasoning.

Reason is different from “common sense” in that it should be produce self-consistent decision making that can be justified and interrogated. Reasoning is a process applied to instinct, insight, intuition, experience and knowledge.

So please don’t show me your data, explain your decision instead.

 

Standard
Work

Agile: are scrummasters the masters?

One of the fault lines in modern Agile development remains the purpose and application of process. For me the fundamental conflict between a developer and a “scrummaster” is to do with what the main purpose of that role is. Scrummasters often profess a servant manager role for themselves while actually enacting a traditional master hierarchical function.

The following is the acid test for me. The servant manager is one who takes the work I am doing and expresses it in a form that allows people outside the team to understand what I am doing, the progress I have made on it and make predictions about when my work will be complete.

The traditional manager instead tries to control my work so that it fits neatly into the reporting tools that they want to use. They don’t hesitate to interfere, manipulate and control to make their life easier with their own superiors.

Calling yourself a servant manager but then telling people how to structure their work is paying lipservice to a popular slogan while continuing a strand of managerial behaviour that has been proven to fail for decades.

Standard
Work

Agile software development defers business issues

My colleague Michael Brunton-Spall makes an interesting mistake in his latest blog post:

much of our time as developers is being completely wasted writing software that someone has told us is important.  Agile Development is supposed to help with this, ensuring that we are more connected with the business owners and therefore only writing software that is important.

Most Agile methodologies actually don’t do what Michael says here. Every one I’ve encountered in the wild treats it as almost axiomatic that there exists someone who knows what the correct business decision is. That person is then given a title, “product owner” for example and then is usually assigned responsibility for three things: deciding what order work is to be done, judging whether the work has been done correctly and clarifying requirements until they can be reduced to a programming exercise.

That’s why it was liberating to come across System Thinking which does try to take a holistic approach and say that any organisation is only really as good as its worst performing element. Doing that does not eliminate all the process improvements in development that Agile can provide but also illustrates that a great development team doing the wrong thing is a worse outcome than a poor development team doing the right thing.

The invention of the always correct product owner was a neat simplification of a complex problem that I think was probably designed to avoid having multiple people telling a development team different requirements. Essentially by assigning the right to direct the work of the development team to one person the issue of detail and analysis orientated developers getting blown off-course by differing opinions was replaced by squabbling outside the team to try and persuade the decision maker. Instead of developer versus business the problem was now business versus business.

Such a gross simplification has grave consequences as the “product owner” is now a massive point of failure and few software delivery teams can effectively isolate themselves from the effects of such a failure. I have heard the excuse “we’re working on the prioritised backlog” several times but I’ve never seen it protect a team from a collectivised failure to deliver what was really needed.

Most Agile methodologies essentially just punt and pray over the issue of business requirements and priorities, deferring the realities of the environment in the hoping of tackling an engineering issue. Success however means to doing what Michael suggests and trying to deal with the messy reality of a situation and providing an engineering solution that can cope with it.

Standard
Work

Breaking the two-week release cycle

I gave a lightning talk about some of the work I did last year at the Guardian to help break the website out of the two-week release cycle and make it possible to switch to a feature-release based process. It’s the first time I’ve given a public talk about it although I have discussed with friends and obviously within the Guardian as well where we are still talking about how best to adopt this.

I definitely think that feature-releasing is the the only viable basis for effectively software delivery, whether you are doing continuous delivery or not.

In a short talk there’s a lot you have to leave out but the questions in the pub afterwards were actually relatively straight-forward. The only thing I felt I didn’t necessarily get across (despite saying it explicitly) was that this work was done on the big Enterprise Java monolith at the Guardian. We aren’t talking about microapps or our new mobile platform (although they too are released on a feature basis rather than on a cycle) we are talking about the application that is sometimes referred to as the “Monolith”. It was really about changing the world to make it better rather than avoid difficulty and accepting the status quo.

Feature-releasing has real benefits for supporting and maintaining software. On top of this, if you want to achieve collective team effort then focussing on a feature it going to better rather than doing a swath of work in a mini-waterfall “sprint”. The team stands a better chance of building up a release momentum and cadence and from that building up stakeholder confidence and a reputation for responsive delivery.

Standard
Programming, Work

Optimizely testing like a hacker

At work we use Optimizely and I am a fan of the product; I think it has had a massive impact on the way we work and should really help guide us to decide what we choose to do.

However I am not a product manager, user testing expert or statistician (that last part is a lie, I’m a statistician who hasn’t done any stats for seventeen years) I am a dirty hacker programmer and I use Optimizely in a way that probably makes my colleagues weep but which I think actually makes it more valuable as a product. I want to talk about breaking some of the common rules that people put up around this testing.

Note that you need to understand what you’re doing here, I am not recommending this if you are new to the product or multi-variate testing. You also need a good stream of traffic to work on. I do, this is working out for me. One piece of good practice I would keep is: decide how you are going to judge the test before you start it and don’t change your measure once you’ve started. If it is clear your initial metrics aren’t helpful, design a new test. The knowledge you’ve gained is valuable for formulating the right measures.

Don’t change the test once you’ve started it

Only once the test has started can you understand what the problem you are dealing with is and what responses you can take to the issues. If you have a question about what is happening in the test feel free to create a new variation (always with a good name!) and throw it into the mix. I sometimes start with one variation and end the test with nine. It’s better to test immediately than speculate.

Changing a variation (no matter how tempting) is dangerous though as you’ll have to remember the differences and when you applied them. I prefer to spawn variations to changing an in-flight variation. Of course fixing bugs and unintentional consequences is fine. You’re looking at the long term rate not the initial performance.

Don’t change the traffic

I’m not sure this is a general shibboleth but I play around with traffic massively during the test. The great thing about Optimizely is that it takes care of the math so feel free to mix the allocation of traffic freely. If you have a run-away winner early on then don’t be afraid to feed the majority of traffic to it.

Make the test work for the whole audience

I don’t believe in this, make the test work for the easiest audience segment to access. I frequently only test on modern browsers. If you find a trend then shock, horror it often works for the whole audience. It’s about fast feedback not universal truth.

The biggest advantage is that you can use CORS-compliant browsers to do bigger changes to the pages under test.

Don’t change the underlying content

If you take your best performing variation and apply it to the page then the “Original” variation should trend to the variation. If it doesn’t then you know something is up with your measuring. I actually think it is really helpful to make a succession of changes to the base content, based on the tests until the Original variation is performing better than the individual variations.

Once Original is top performing variation you can stop testing the page.

A/A testing has problems

So what? Optimizely has a few issues, you need to deal in big numbers. A/A can be helpful but if you are working in five digit numbers or double-digit percentages then don’t worry about the noise.

Tests have to look good

If your theory is accurate it absolutely does not have to look good. If you are worried that your hypothesis is not working because of the visuals: get over yourself and admit that the idea was weak and you need to rethink it.

I like to start off all variations looking a bit crappy and then seeing whether they can be outperformed by an improved appearance. Often the answer is no; there is a rule of diminishing returns on the appearance of a variation. Things get over-designed on the web all the time. However by trying better looking variations in increments you know exactly how much effort to invest.

Standard