Programming

Writing code without tests

This post is aimed at people who have mastered test-driven development and ideally also behaviour-driven development and who are familiar with XCheck testing. If you don’t have good basic steps then trying to jump onto some of these techniques are likely to backfire on you as you will probably struggle to assess the risks correctly.

There is a reason TDD was invented, it represents the refinement of good testing practice and the philosophy of good software design. TDD is a relatively simple practice to describe that requires effort to implement. Writing code driven by tests is safer that straight-coding.

Writing untested code is a kind of mastery technique. It is high-risk and relies on the skills and the knowledge of the programmer. I don’t think it is ever responsible if the programmer is not going to be the person supporting the result in production. Without this condition then the programmer’s interests are not properly aligned with the consumers of their code.

So with all those caveats in place what if we want to create code faster because we don’t have to write tests?

Well we have to understand where bugs come from and we will have to write code that doesn’t allow those situations to arise.

There are two important principles to start with. If you can rely on tested library code, then you can rely on the underlying quality of the tested code and leverage it in your own application. Secondly the code you don’t write will not have bugs.

Therefore we should be aiming to write the smallest amount of code possible and we should never try to code what others have coded for us.

The next point is about where bugs occur. I think we’re now at a consensus point that most bugs occur in the way we change and maintain state. In both procedural and functional languages it is rare to get a mistake in the order of steps that something must happen in for example. These kind of problems tend to be misunderstandings of the domain (that get written into test suites as well so testing doesn’t help catch them) rather than genuinely unexpected consequences of the programmer’s code. Object-orientated code is really hard to reason about from this point of view as objects don’t have an implied order of execution.

This is why quick scripts of less than 200 lines tend to do stable sterling service for years whereas larger applications are more tortured in their existence.

Therefore whatever language we are coding in we need to adopt the functional principle of operating only on our parameters and returning values that can be consumed by the caller.

Size matters, a lot, if whole program can fit into a single file and you can pretty much hold the whole thing in your head then it will be easy to reason about what the program is doing and see flaws in the logic of the program. A single complex line of code is better than many lines and is much better than many lines split across many files.

One way to bring down the size of code files is to be ruthless about concerns. For example recently in my Python programming I have been assigning only one purpose to each module: this module renders reports, this one provides JSON endpoints.

Another technique is to not persist any state, this is actually surprisingly easy in web programming since each request is completely separate event and by default you can trade CPU time for isolation.

If you are doing batch or server-side programming then it is worth considering using something like parallel to create many separate bubbles of execution rather than trying to write code yourself to distribute work.

Another aspect of state that causes issues are making global modifications, whether it be to a database or a filesystem. Try and defer all global changes to the final moment of a program and do all the manipulation in-memory instead. If you never change the world then you can run a program over and over again refining what it does.

Assertions are more powerful than logging in writing test-less code, it is better to kill a thread of execution rather than let it do something you weren’t expecting. Logging is really about helping build your intuition about what a program does and how it works.

Assertions allow you to create strong pre and post-conditions on the operation of the program. Essentially they allow you to guarantee the “happy path” execution of your code and avoid having to test all the negative situations that might occur.

Despite this you always want to code for failure, use short-circuit logic to abort code flow early and therefore simplify the context of the code in the rest of the function.

Remember all the basic rules of cyclomatic complexity, don’t nest, don’t do conditionals, do try and express your looping as list comprehensions.

Don’t write generic code, ever. The more potential inputs a function has, the more you end up needing unit-tests to verify the interactions. If something is meant to work on strings don’t try to make it work on strings and integers. Your detection code ends up being a potential source of bugs that needs testing.

If you write dynamic interpreted languages then you are going to have do some manual testing, unless you can remember the names and orders of the functions exactly. Don’t forget to dive into the shell or REPL and play around with the code in isolation. If you can verify the behaviour of individual parts of your program without having to wire together multiple components then you have the right level of granularity for your code.

Re-use code that is already working. Code re-use is generally best achieved by cut and pasting files and then importing the functions you need. Don’t try and synchronise your code, updating some library code ultimately means that you are going to know whether the new library code works as you expect with your functionality and that means you’ll need a test suite.

Don’t refactor your code, rewrite it. Refactoring requires unit tests. Don’t be afraid of things like myfunction2 (although once you have the new functionality you need to delete the old unused stuff). Re-writing allows you to ditch all your assumptions about the code and attempt to express your new understanding of the problem and the requirements as simply as possible.

Don’t work with large numbers of people on the same code base. The more people trying to modify and change the code the more you need tests to try and clarify your different intents for the code base. Again, try divide and conquer on the problem, rather than six people working on the same code can you get three sets of two people collaborating on three smaller codebases.

Finally don’t be afraid to write a test. Writing the right unit-test to prove you can rely on a base piece of functionality means that you then don’t have to write tests for all the pieces of code that use that underlying function. I like to try and write code without tests to maximise the flexibility of the code base when I’m tackling problems with unclear solutions. It is not an ideological thing to have no tests whatsoever, it is rather that when tempted to write a test I think “Could I do this in a way that is trivial and doesn’t require a test?”. Simplicity is the cornerstone of test-free code.

Standard
Programming

Metrics and craftmanship

Ever since we’ve had access to increasingly more comprehensive and easy to comprehend metrics there has been a conflict between the artisan and craftsmanship elements of software development and the data-driven viewpoints.

Things like code quality are seen as being difficult to express in terms of user-affecting metrics. I suspect that is because most of the craft concerns of software development do not affect the overall value of a product. This is not to align myself fully with those driven by metrics.

There are lots of situations where two approaches result in the same metrics outcome. It is tempting to give in to the utilitarian argument that in this case what you should do is simply choose the lowest cost option.

That is too reductionist though and while it may lead to an optimised margin-generating product it feels to me that it is just as likely to create a spiral of compromise that jeopardises the ability to make further improvements.

It is here at the point where metrics are silent that we are put to the test of making good decisions. Our routes forward are neutral from a data point of view but good decisions will unlock better possibilities in the future. It is at this moment that I feel our preferences for things like craft and aesthetic and our understanding of things like cost and consequence matter. Someone able to understand how to achieve beauty and simplicity in software for the same cost as the compromise while achieve very different outcomes.

So we need to be metrics-first so we know we are being honest with ourselves but once we are doing things in a truthful environment, our experience and discretion can make all the difference.

Standard
Programming

Concurrency means performance, yes?

One thing I heard a lot at the Mostly Functional conference last week that concurrency is required for performance on multicore processors. Since Moore’s Law ended it is certainly true that the old trick of not writing performant code but letting hardware advances pick up the slack has been flagging (although things like SSD have still had their impact).

However equating concurrent code with performance is subtly wrong. If there was a direct relationship then we would have seen concurrent programming adopted swiftly by the games programmers. And yet there we still see an emphasis on ordered, predictable execution, cache structure and algorithmic efficiency.

Performance is one of those vague computing terms, like scale, that has many dimensions. Concurrency has no direct relation to performance as anyone who has managed to write a concurrent program with global resource contention can attest.

There are two relevant axes to considering performance and concurrency: throughput and capacity. Concurrency, through parallelism, allows you to greatly increase your utilisation of the available resources to provide a greater capacity for work.

However that work is not inherently performed faster and may actually result in lowered throughput due to the need to read data that is not in memory and the inability to predict the order of execution.

For things like webservices that are inherently stateless then often concurrency does massively increase performance because the capacity to serve request goes up and there is no need coordinate work. If the webservice is accessing a shared store where essentially all of the key data is in memory and what we need to do is read rather than mutate that data then concurrency becomes even more desirable.

On the other hand, if what we want to do is process work as quickly as possible, i.e. maximise throughput, then concurrency can be a very poor choice.

If we cannot predict the order that work will need to be executed in, due to things like having to distribute work across threads and retry work due to temporary errors then we may have to create the entire context for the work repeatedly and load it into local memory.

In circumstances like these concurrency hurts performance. After all the fastest processing is probably still pointer manipulation of a memory-mapped file, if your want to really fast.

So concurrency means performance and beating Moore’s Law if you can be stateless and value volume of processing over unit throughput.

Standard