London, Work

Will humans still create software?

Recently I attended the London CTOs Unconference: an event where senior technical leaders discuss various topics of the day.

There were several proposed sessions on AI and its impact on software delivery and I was party of a group discussing the evolution of AI-assisted development and looking at the how this would change and ultimately what role people thought there would be for humans in software delivery (we put the impacts of artificial general intelligence to one side to focus on what happens with the technologies we currently have).

The session was conducted under Chatam House equivalent rules so I’m just going to record some of the discussion and key points in this post.

Session notes

Currently we are seeing automating of existing processes within the delivery lifecycle but there are opportunities to rethink how we deliver software that makes better use of the current generative AI tools and perhaps sets us up to take advantage of better options in future.

Rejecting a faster horse

Thinking about the delivery of the whole system rather than just modules of code, configuration or infrastructure allows us to think about setting a bigger task than simply augmenting a human. We can start to take business requirements in natural language, generate product requirement documents from these and then use formal methods to specify the behaviour of the system and verify that the resulting software that generative AI creates meets these requirements. Some members of the group had already been generating such systems and felt it was a more promising approach than automating different roles in the current delivery processes.

Although these methods have existed for a while they are not widely used currently and therefore it seems likely that reskilling will be required by senior technical leaders. Defining the technical outcomes they are looking for through more formal structures that work better with machines requires both knowledge and skill. Debugging is likely to move from the operation of code to the process of generation within the model and leading to an iterative cycle of refinement of both prompts and specifications. In doing this, the recent move to expose more chain of thought information to user is helpful and allows the user to refine their prompt when they can see flaws in the reasoning process of the model.

The fate of code

We discussed whether code would remain the main artefact of software production and we didn’t come to a definite conclusion. The existing state of the codebase can be given as a context to the generation process, potentially refined as an input in the same way as retrieval augmented generation works.

However if the construction of the codebase is fast and cheap then the value of code retention was not clear, particularly if requirements or specifications are changing and therefore an alternative solution might be better than the existing one.

People experimenting with whole solution generation do see major changes between iterations; for example where the model selects different dependencies. For things like UIs this matters in terms of UX but maybe it doesn’t matter so much for non-user facing things. If there is a choice of database mappers for example perhaps we only care that the performance is good and that SQL injection is not possible.

Specifications as artefacts

Specifications and requirements need to be versioned and change controlled exactly as source code is today. We need to ensure that requirements are consistent and coherent, which formal methods should provide, but analysis and resolution of differing viewpoints as to the way system works will remain an important technical skill.

Some participants felt that conflicting requirements would be inevitable and that it would be unclear as to how the generating code would respond to this. It is certainly clear that currently models do not seem to be able to identify the conflict and will most probably favour one of the requirements over the others. If the testing suite is independent than behavioural tests may reveal the resulting inconsistencies.

It was seen as important to control your own foundation model rather than using external services. Being able to keep the model consistent across builds and retain a working version of it decouples you from vendor dependencies and should be considered as part of the build and deployment infrastructure. Different models have different strengths (although some research contradicts this anecdotal observation). We didn’t discuss supplementation techniques but we did talk about priming the code generation process with coding standards or guidelines but this did not seem to be a technique currently in use.

For some participants using generative AI was synonymous with choosing a vendor but this is risky as one doesn’t control the lifespan of such API-based interactions or how a given model might be presented by the vendor. Having the skills to manage your own model is important.

In public repositories it has been noted that the volume of code produced has risen but quality has fallen and that there is definitely a trade-off being made between productivity and quality.

This might be different in private codebases where different techniques are used to ensure the quality of the output. People in the session trying these techniques say they are getting better results than what is observed in public reporting. Without any way of verifying this though people will just have to experiment for themselves and see if they can improve on the issues seen in the publicly observed code.

When will this happen in banks?

We talked a little bit about the rate of adoption of these ideas. Conservative organisations are unlikely to move on this until there is plenty of information available publicly. However if an automated whole system creation process works there are significant cost savings associated with it and projects that would previously have been ruled out as too costly become more viable.

What does cheap, quick codebases imply?

We may be able to retire older systems with a lot of embedded knowledge in them far more quickly than if humans had to analyse, extract and re-implement that embedded knowledge. It may even be possible to recreate a mainframe’s functionality on modern software and hardware and make it cheaper than training people in COBOL.

If system generation is cheap enough then we could also ask the process to create implementations with different constraints and compare different approaches to the same problem optimising for cost, efficiency or maintenance. We can write and throw away many times.

What about the humans?

The question of how humans are involved in the software delivery lifecycle, what they are doing and therefore what skills they need was unclear to us. However no-one felt that humans would have no role in software development but that it was likely to be different to the skill set the made people successful today.

It also seemed unlikely that a human would be “managing” a team of agents if the system of specification and constraints was adopted. Instead humans would be working at a higher level of abstraction with a suite of tools to deliver the implementation. Virtual junior developers seemed to belong to the faster horse school of thinking.

Wrapping up the session

The session lasted for pretty much the whole of the unconference and the topic often went broad which meant there were many threads and ideas that were not fully resolved. It was clear that there are currently at least two streams of experimentation: supplementing existing human roles in the software delivery cycle with AI assistance and reinventing the delivery process based on the possibilities offered by cheap large language models.

As we were looking to the future we most discussed the second option and this seems to be what people have in mind when they talk about not needing experienced software developers in future.

In some ways this is the technical architect’s dream in that you can start to work with pure expressions of solutions to problems that are faithfully adhered to by the implementer. However the solution designer now needs to understand how and why the solution generation process can go wrong and needs to verify the correctness and adherence of the final system. The non-deterministic nature of large language models are not going away and therefore solution designers need to think carefully about their invariants to ensure consistency and correctness.

There was a bit of an undertow in our discussions about whether it was a positive that a good specification almost leads to a single possible solution or whether we need to allow the AI to confound our expectation of the solution and created unexpected things that met our specifications.

The future could be a perfect worker realising the architect’s dream or it could be more like a partnership where the human is providing feedback on a range of potential solutions provided by a generation process, perhaps with automated benchmarking for both performance and cost.

It was a really interesting discussion and an interesting snapshot of what is happening in what feels like an area of frenzied activity from early adopters, later adopters probably can afford to give this area more time to mature as long as they keep an eye on whether the cost of delivery is genuinely dropping in the early production projects.

Standard