Work

October 2023 month notes

I’ve been learning more about Postgres as I have been moving things from Dataset to Psycopg3. It is kind of ridiculous the kind of things you can do with it when strip away the homogenising translation layer of things like ORMs. Return a set of columns from your update? No problem. Upsert? Straight-forward.

However after completing a CONFLICT clause I received a message that no conflict was possible on the columns I was checking and I discovered that I had failed to add a Primary Key to the table when I created it. It probably didn’t matter to the performance of the table as it was a link table with indexes on each lookup column but I loved the way that the query parsing was able to do that level of checking on my structure.

Interestingly I had a conflict clause in my previous ORM statement I was replacing and it had never had an issue so presumably it was doing an update then insert pattern in a transaction rather than using native features. For me this shows how native solutions are often better than emulation.

Most of the apps I’ve converted to direct use of queries are feeling more responsive now (including the one I use to draft these posts) but I’m not 100% certain whether this is because of switch to lower-level SQL or because I’ve been fixing the problems in the underlying relational model that were previously being hidden from me.

We’re going to need a faster skateboard

I have been thinking a lot about the Gold-plated Donkey Cart this month. When you challenge problems with solutions you often first have a struggle to get people to admit that there is a problem and even if it is admitted then often the first response is to try and patch or amend the existing solution than consider whether the right response might be.

We have additive minds so this tendency to patch what is existing is natural but sometimes people aggressively defend the status quo, even when it is counter-productive to their overall success.

Weakly typed

I’ve had some interesting experiences with Typescript this month, most notably an issue with a duplicated package which resulted in code that has been running in production for months but which has either not been correctly typed or has been behind the intended version by maybe four major versions. Typescript is interesting amongst type-hinted languages in that it has typing files that are often supplied separately from the code itself and in some cases which exist independently of the code itself. My previous experience of Python typing for example stopped the checker at the boundaries of third-parties and therefore only applied to the code you are writing yourself.

I’m uncertain of the value of providing type files for Javascript libraries as the compile-time and runtime contexts seem totally different. I found a Javascript dependency that had a completely broken unit test file and on trying to correct it I found that it couldn’t have the behaviour that the tests were trying to verify. Again I wondered about how this code was working in production and predictably it turned out that the executed code path never included the incorrectly specified behaviour. Dynamic code can be very resilient and at the same time a time bomb waiting to happen no matter what your

I think Typescript code would be better off if it was clearer that any guarantees of correctness can only be provided for the code you have totally under your control and which is being compiled and checked by you.

Frozen in time

I’ve been thinking a lot as well about a line from this talk by Killian Valkhof where he mentions that our knowledge on how to do things often gets frozen based on how we initially learnt to do things. For developers who learnt React for frontend will be the future people who learnt to do frontend via jQuery. I’ve been looking at Web Components which I thought were pretty terrible when they first came out but now look delightfully free of complex build chains and component models.

But more fundamentally it has made me think about when I choose or reject things am I doing so based on their inherent qualities in the present moment or based on the moment in time when I first learnt and exercised those skills. For CSS for example I’m relatively old-fashioned and I have never been a fan of the CSS-in-JS idea. However I think this approach, while maybe being outside contemporary preferences, is sound. Sound CSS applies across any number of frontend component models and frameworks and the work that goes into the CSS standards is excellent where as (ironically) the limitations of Javascript frameworks to express CSS concepts means that often it is a frozen subset that is usable.

I’ve never been entirely comfortable with Docker or Kubernates though and generally prefer PaaS or “serverless” solutions. Is that because I enjoyed the Heroku developer experience and never really understood the advantages of containerisation as a result.

Technology is fashion and therefore discernment is a critical quality for developers. For most developers though it is not judgement that they manifest but a toxic self-belief in the truth of whatever milieu they entered into the industry in. As I slog through my third decade in the profession doubt is something that I feel strongly about my opinions and trying to frame my judgements in the evidence and reasoning available now seems a valuable technique.

Standard
Programming

Writing code without tests

This post is aimed at people who have mastered test-driven development and ideally also behaviour-driven development and who are familiar with XCheck testing. If you don’t have good basic steps then trying to jump onto some of these techniques are likely to backfire on you as you will probably struggle to assess the risks correctly.

There is a reason TDD was invented, it represents the refinement of good testing practice and the philosophy of good software design. TDD is a relatively simple practice to describe that requires effort to implement. Writing code driven by tests is safer that straight-coding.

Writing untested code is a kind of mastery technique. It is high-risk and relies on the skills and the knowledge of the programmer. I don’t think it is ever responsible if the programmer is not going to be the person supporting the result in production. Without this condition then the programmer’s interests are not properly aligned with the consumers of their code.

So with all those caveats in place what if we want to create code faster because we don’t have to write tests?

Well we have to understand where bugs come from and we will have to write code that doesn’t allow those situations to arise.

There are two important principles to start with. If you can rely on tested library code, then you can rely on the underlying quality of the tested code and leverage it in your own application. Secondly the code you don’t write will not have bugs.

Therefore we should be aiming to write the smallest amount of code possible and we should never try to code what others have coded for us.

The next point is about where bugs occur. I think we’re now at a consensus point that most bugs occur in the way we change and maintain state. In both procedural and functional languages it is rare to get a mistake in the order of steps that something must happen in for example. These kind of problems tend to be misunderstandings of the domain (that get written into test suites as well so testing doesn’t help catch them) rather than genuinely unexpected consequences of the programmer’s code. Object-orientated code is really hard to reason about from this point of view as objects don’t have an implied order of execution.

This is why quick scripts of less than 200 lines tend to do stable sterling service for years whereas larger applications are more tortured in their existence.

Therefore whatever language we are coding in we need to adopt the functional principle of operating only on our parameters and returning values that can be consumed by the caller.

Size matters, a lot, if whole program can fit into a single file and you can pretty much hold the whole thing in your head then it will be easy to reason about what the program is doing and see flaws in the logic of the program. A single complex line of code is better than many lines and is much better than many lines split across many files.

One way to bring down the size of code files is to be ruthless about concerns. For example recently in my Python programming I have been assigning only one purpose to each module: this module renders reports, this one provides JSON endpoints.

Another technique is to not persist any state, this is actually surprisingly easy in web programming since each request is completely separate event and by default you can trade CPU time for isolation.

If you are doing batch or server-side programming then it is worth considering using something like parallel to create many separate bubbles of execution rather than trying to write code yourself to distribute work.

Another aspect of state that causes issues are making global modifications, whether it be to a database or a filesystem. Try and defer all global changes to the final moment of a program and do all the manipulation in-memory instead. If you never change the world then you can run a program over and over again refining what it does.

Assertions are more powerful than logging in writing test-less code, it is better to kill a thread of execution rather than let it do something you weren’t expecting. Logging is really about helping build your intuition about what a program does and how it works.

Assertions allow you to create strong pre and post-conditions on the operation of the program. Essentially they allow you to guarantee the “happy path” execution of your code and avoid having to test all the negative situations that might occur.

Despite this you always want to code for failure, use short-circuit logic to abort code flow early and therefore simplify the context of the code in the rest of the function.

Remember all the basic rules of cyclomatic complexity, don’t nest, don’t do conditionals, do try and express your looping as list comprehensions.

Don’t write generic code, ever. The more potential inputs a function has, the more you end up needing unit-tests to verify the interactions. If something is meant to work on strings don’t try to make it work on strings and integers. Your detection code ends up being a potential source of bugs that needs testing.

If you write dynamic interpreted languages then you are going to have do some manual testing, unless you can remember the names and orders of the functions exactly. Don’t forget to dive into the shell or REPL and play around with the code in isolation. If you can verify the behaviour of individual parts of your program without having to wire together multiple components then you have the right level of granularity for your code.

Re-use code that is already working. Code re-use is generally best achieved by cut and pasting files and then importing the functions you need. Don’t try and synchronise your code, updating some library code ultimately means that you are going to know whether the new library code works as you expect with your functionality and that means you’ll need a test suite.

Don’t refactor your code, rewrite it. Refactoring requires unit tests. Don’t be afraid of things like myfunction2 (although once you have the new functionality you need to delete the old unused stuff). Re-writing allows you to ditch all your assumptions about the code and attempt to express your new understanding of the problem and the requirements as simply as possible.

Don’t work with large numbers of people on the same code base. The more people trying to modify and change the code the more you need tests to try and clarify your different intents for the code base. Again, try divide and conquer on the problem, rather than six people working on the same code can you get three sets of two people collaborating on three smaller codebases.

Finally don’t be afraid to write a test. Writing the right unit-test to prove you can rely on a base piece of functionality means that you then don’t have to write tests for all the pieces of code that use that underlying function. I like to try and write code without tests to maximise the flexibility of the code base when I’m tackling problems with unclear solutions. It is not an ideological thing to have no tests whatsoever, it is rather that when tempted to write a test I think “Could I do this in a way that is trivial and doesn’t require a test?”. Simplicity is the cornerstone of test-free code.

Standard
Programming

Metrics and craftmanship

Ever since we’ve had access to increasingly more comprehensive and easy to comprehend metrics there has been a conflict between the artisan and craftsmanship elements of software development and the data-driven viewpoints.

Things like code quality are seen as being difficult to express in terms of user-affecting metrics. I suspect that is because most of the craft concerns of software development do not affect the overall value of a product. This is not to align myself fully with those driven by metrics.

There are lots of situations where two approaches result in the same metrics outcome. It is tempting to give in to the utilitarian argument that in this case what you should do is simply choose the lowest cost option.

That is too reductionist though and while it may lead to an optimised margin-generating product it feels to me that it is just as likely to create a spiral of compromise that jeopardises the ability to make further improvements.

It is here at the point where metrics are silent that we are put to the test of making good decisions. Our routes forward are neutral from a data point of view but good decisions will unlock better possibilities in the future. It is at this moment that I feel our preferences for things like craft and aesthetic and our understanding of things like cost and consequence matter. Someone able to understand how to achieve beauty and simplicity in software for the same cost as the compromise while achieve very different outcomes.

So we need to be metrics-first so we know we are being honest with ourselves but once we are doing things in a truthful environment, our experience and discretion can make all the difference.

Standard
Programming

Concurrency means performance, yes?

One thing I heard a lot at the Mostly Functional conference last week that concurrency is required for performance on multicore processors. Since Moore’s Law ended it is certainly true that the old trick of not writing performant code but letting hardware advances pick up the slack has been flagging (although things like SSD have still had their impact).

However equating concurrent code with performance is subtly wrong. If there was a direct relationship then we would have seen concurrent programming adopted swiftly by the games programmers. And yet there we still see an emphasis on ordered, predictable execution, cache structure and algorithmic efficiency.

Performance is one of those vague computing terms, like scale, that has many dimensions. Concurrency has no direct relation to performance as anyone who has managed to write a concurrent program with global resource contention can attest.

There are two relevant axes to considering performance and concurrency: throughput and capacity. Concurrency, through parallelism, allows you to greatly increase your utilisation of the available resources to provide a greater capacity for work.

However that work is not inherently performed faster and may actually result in lowered throughput due to the need to read data that is not in memory and the inability to predict the order of execution.

For things like webservices that are inherently stateless then often concurrency does massively increase performance because the capacity to serve request goes up and there is no need coordinate work. If the webservice is accessing a shared store where essentially all of the key data is in memory and what we need to do is read rather than mutate that data then concurrency becomes even more desirable.

On the other hand, if what we want to do is process work as quickly as possible, i.e. maximise throughput, then concurrency can be a very poor choice.

If we cannot predict the order that work will need to be executed in, due to things like having to distribute work across threads and retry work due to temporary errors then we may have to create the entire context for the work repeatedly and load it into local memory.

In circumstances like these concurrency hurts performance. After all the fastest processing is probably still pointer manipulation of a memory-mapped file, if your want to really fast.

So concurrency means performance and beating Moore’s Law if you can be stateless and value volume of processing over unit throughput.

Standard