March 2024 month notes

Dependabot under the hood

I spent a lot more time this month than I was expecting with one of my favourite tools Github’s Dependabot. It started when I noticed that some of the projects were not getting security updates that others were. I know it possible for updates to be suspended on projects that neglect their updates for too long (I should really archive some of my old projects) but checking the project settings confirmed that everything was setup correctly and there was nothing that needed enabling.

Digging in I wondered how you are meant to view what Dependabot is doing, you might think it is implemented as an Action or something similar but in fact you access the information through the Insights tab.

Once I found it though I discovered that the jobs had indeed been failing silently (I’m still not sure if there’s a way to get alerted about this) because we had upgraded our Node version to 20 but had set the option engine-strict on. It turns out that Dependabot runs on its own images and those were running Node 18. It may seem tempting to insist that your CI uses the same version as your production app but in the case of CI actions there’s no need to be that strict, after all they are just performing actions in your repository management that aren’t going to hit your build chain directly.

Some old dependencies also caused problems in trying to reconcile their target version, the package.json Node engine and the runtime Node version. Fortunately these just highlighted some dependency cruft and depreciated projects that we just needed to cut out of the project.

It took a surprising amount of time to work through the emergent issues but it was gratifying to see the dependency bundles flowing again.

Rust

I started doing the Rustlings tutorial again after maybe a year in which I’d forgotten about it (having spent more time with Typescript recently). This is a brilliant structured tutorial of bite-sized introductions to various Rust concepts. Rust isn’t that complicated as a language (apart from its memory management) but I’ve found the need to have everything right for the code to compile means that you tend to need to devote dedicated time to learning it and it is easy to hit some hard walls that can be discouraging.

Rustlings allows you to focus on just one concept and scaffolds all the rest of the code for you so you’re not battling a general lack of understanding of the language structure and just focus on one thing like data structures or library code.

Replacing JSX

Whatever the merits of JSX it introduces a lot of complexity and magic into your frontend tooling and I’ve seen a lot of recommendations that it simply isn’t necessary with the availability of tagged string literals. I came back to an old Preact project this month that I had built with Parcel. The installation had a load of associated security alerts so on whim I tried it with ViteJS which mostly worked except for the JSX compilation.

Sensing a yak to shave I started to look at adding in the required JSX plugin but then decided to see if I really needed it. The Preact website mentioned htm as an alternative that had no dependencies. It took me a few hours to understand and convert my code and I can’t help but feel that eliminating a dependency like this is probably just generally a good idea.

The weirdest thing about htm is how faithful it is to the JSX structure, I was expecting something a bit more, well, HTML-ly but props and components pretty much work exactly how they do in JSX.

Postgres news

A Postgres contributer found a backdoor into SSH that required an extensive amount of social engineering to achieve. If you read his analysis of how he discovered it then it seems improbable that it would have been discovered. Some people have said this is a counterpoint to “many eyes make bugs shallow” but the really problem seems to be how we should be maintaining mature opensource projects that are essentially “done” and just need care and oversight rather than investment. Without wanting to centralise open source it feels like foundations actually do a good job here by allowing these kind of projects to be brought together and have consistent oversight and change management applied to them.

I read the announcement of pgroll which claims to distil best practice for Postgres migrations regarding locks, interim compatibility and continuous deployment. That all sounds great but the custom definition format made me feel that I wanted to understand it a little better and as above, who is going to maintain this if it is a single company’s tool?

Postgres was also compiled into WASM and made available as an in-memory database in the browser, which feels a bit crazy but is also awesome for things like testing. It is also a reminder of how Web Assembly opens up the horizons of what browsers can do.

Hamstack

Another year, another stack. I felt Hamstack was tongue in check but the rediscovery of hypermedia does feel real. There’s always going to be a wedge of React developers, just like there will be Spring developers, Angular developers or anything else that had a hot moment at some point in tech history. However it feels like there is more space to explore web native solutions now than there was in the late 2010s.

This article also introduced me to the delightful term “modulith” which perfects describes the pattern that I think most software teams should follow until the hit the problems that lead to other solution designs.

Work

February 2024 month notes

Postgres

Cool thing of the month is pgmem which is a NodeJS in-memory database with a Postgres compatible API. It makes it easy to create very complete integration or unit tests covering both statement testing and object definitions. So far everything that has worked with pgmem has been flawless in both Docker-ised Postgres instances and CloudSQL Postgres.

The library readme says that containers for testing are overkill and it has delivered on that claim for me. Highly recommended.

Less good has been adventures in CloudSQL’s IAM world. A set of overlapping work requirements means that the conventional practices of using roles and superuser permissions is effectively impossible so I’ve been diving deeper than I’ve ever expected to go into the world of Postgres’s permission model.

My least favourite discovery this month has been that it is possible to successfully grant a set of permissions to a set of users that generates no errors (admittedly via a Terraform module; I need to check whether the Postgres directly complains about this) but also gets denied by the permission system.

The heart of the problem seems to be that the owner of the database objects defines the superset of permissions that can be accessed by other users but that you can happily grant other users permissions outside of that superset without error except when you try to use that permission.

The error thrown was reported on a table providing a foreign key constraint so there were more than a few hours spent wondering why the user could read the other table but then get permission denied on it. The answer seemingly being that the insert into the child table triggers the permission violation but that the validation of the constraint in the constraining table triggers the permission system.

I’m not sure any of this knowledge will ever be useful again because this setup is so atypical. I might try and write a DevTo article to provide something for a future me to Google but I’m not quite sure how to phrase it to match the query.

Eager initialisation

I learnt something very strange about the Javascript test data generation FakerJS this month but it just a specific example of libraries that don’t make an effort to lazy load their functionality. I’ve come across this issue in Python where it affected start times in on-demand code, Java where the assumption that initialisation is a one-time cost meant that multiple deployments a day meant the price was never amortised and now I’ve encountered it in Javascript.

My takeaways are that it is important to [set aggressive timeouts](https://nodejs.org/api/cli.html#–test-timeout) on your testing suite rather than take the default of no timeouts.. This only surfaced because some fairly trivial tests using the Faker data couldn’t run in under a second which seemed very odd behaviour.

Setting timeouts also helps surface broken asynchronous testing and makes it less tedious to wait for the test suite to fail or hang.

Work

October 2023 month notes

I’ve been learning more about Postgres as I have been moving things from Dataset to Psycopg3. It is kind of ridiculous the kind of things you can do with it when strip away the homogenising translation layer of things like ORMs. Return a set of columns from your update? No problem. Upsert? Straight-forward.

However after completing a CONFLICT clause I received a message that no conflict was possible on the columns I was checking and I discovered that I had failed to add a Primary Key to the table when I created it. It probably didn’t matter to the performance of the table as it was a link table with indexes on each lookup column but I loved the way that the query parsing was able to do that level of checking on my structure.

Interestingly I had a conflict clause in my previous ORM statement I was replacing and it had never had an issue so presumably it was doing an update then insert pattern in a transaction rather than using native features. For me this shows how native solutions are often better than emulation.

Most of the apps I’ve converted to direct use of queries are feeling more responsive now (including the one I use to draft these posts) but I’m not 100% certain whether this is because of switch to lower-level SQL or because I’ve been fixing the problems in the underlying relational model that were previously being hidden from me.

We’re going to need a faster skateboard

I have been thinking a lot about the Gold-plated Donkey Cart this month. When you challenge problems with solutions you often first have a struggle to get people to admit that there is a problem and even if it is admitted then often the first response is to try and patch or amend the existing solution than consider whether the right response might be.

We have additive minds so this tendency to patch what is existing is natural but sometimes people aggressively defend the status quo, even when it is counter-productive to their overall success.

Weakly typed

I’ve had some interesting experiences with Typescript this month, most notably an issue with a duplicated package which resulted in code that has been running in production for months but which has either not been correctly typed or has been behind the intended version by maybe four major versions. Typescript is interesting amongst type-hinted languages in that it has typing files that are often supplied separately from the code itself and in some cases which exist independently of the code itself. My previous experience of Python typing for example stopped the checker at the boundaries of third-parties and therefore only applied to the code you are writing yourself.

I’m uncertain of the value of providing type files for Javascript libraries as the compile-time and runtime contexts seem totally different. I found a Javascript dependency that had a completely broken unit test file and on trying to correct it I found that it couldn’t have the behaviour that the tests were trying to verify. Again I wondered about how this code was working in production and predictably it turned out that the executed code path never included the incorrectly specified behaviour. Dynamic code can be very resilient and at the same time a time bomb waiting to happen no matter what your

I think Typescript code would be better off if it was clearer that any guarantees of correctness can only be provided for the code you have totally under your control and which is being compiled and checked by you.

Frozen in time

I’ve been thinking a lot as well about a line from this talk by Killian Valkhof where he mentions that our knowledge on how to do things often gets frozen based on how we initially learnt to do things. For developers who learnt React for frontend will be the future people who learnt to do frontend via jQuery. I’ve been looking at Web Components which I thought were pretty terrible when they first came out but now look delightfully free of complex build chains and component models.

But more fundamentally it has made me think about when I choose or reject things am I doing so based on their inherent qualities in the present moment or based on the moment in time when I first learnt and exercised those skills. For CSS for example I’m relatively old-fashioned and I have never been a fan of the CSS-in-JS idea. However I think this approach, while maybe being outside contemporary preferences, is sound. Sound CSS applies across any number of frontend component models and frameworks and the work that goes into the CSS standards is excellent where as (ironically) the limitations of Javascript frameworks to express CSS concepts means that often it is a frozen subset that is usable.

I’ve never been entirely comfortable with Docker or Kubernates though and generally prefer PaaS or “serverless” solutions. Is that because I enjoyed the Heroku developer experience and never really understood the advantages of containerisation as a result.

Technology is fashion and therefore discernment is a critical quality for developers. For most developers though it is not judgement that they manifest but a toxic self-belief in the truth of whatever milieu they entered into the industry in. As I slog through my third decade in the profession doubt is something that I feel strongly about my opinions and trying to frame my judgements in the evidence and reasoning available now seems a valuable technique.

Programming, Work

August 2023 month notes

I have been doing a GraphQL course that is driven by email. I can definitely see the joy of having autocompletion on the types and fields of the API interface. GraphQL seems to have been deployed way beyond its initial use case and it will be interesting to see if its a golden hammer or genuinely works better than REST-based services outside the abstraction to frontend service. It is definitely a complete pain in the ass compared to HTTP/JSON for hobby projects as having to ship a query executor and client is just way too much effort compared to REST and more again against maybe not doing a Javascript app interface.

I quite enjoyed the course, and would recommend it, but it mostly covered creating queries so I’ll probably need to implement my own service to understand how to bind data to the query language. I will also admit that while it is meant to be quite easy to do each day I ended up falling behind and then going through half of it on the weekend.

Hashicorp’s decision to change the license on Terraform has caused a lot of anguish on my social feeds. The OpenTerraform group has already announced that they will be creating a fork and are also promising to have more maintainers than Hashicorp. To some extent the whole controversy seems like a parade of bastards and it is hard to choose anyone as being in the right but it makes most sense to use the most open execution of the platform (see also Docker and Podman).

In the past I’ve used CloudFormation and Terraform, if I was just using AWS I would probably be feeling smug with the security of my vendor lock-in but Terraform’s extensibility via its provider mechanisms meant you could control a lot of services via the same configuration language. My current work uses it inconsistently which is probably the worst of all worlds but for the most part it is the standard for configuring services and does have some automation around it’s application. Probably the biggest advantage of Terraform was to people switching clouds (like myself) as you don’t have to learn a completely new configuration process, just the differences with the provider and the format of the stanzas.

The discussion of the change made we wonder if I should look at Pulumi again as one of the least attractive things about Terraform is its bizarre status as not quite a programming language, not quite Go and not quite a declarative configuration. I also found out about Digger which is attempting to avoid having two CI infrastructures for infrastructure changes. I’ve only ever seen Atlantis used for this so I’m curious to find out more (although it is such an enterprise level thing I’m not sure I’ll do much than have an opinion for a while).

I also spent some time this month moving my hobby projects from Dataset to using basic Pyscopg. I’ve generally loved using Dataset as it hides away the details of persistence in favour of passing dictionaries around. However it is a layer over SQLAlchemy which is itself going through some major point revisions so the library in its current form is stuck with older versions of both the data interaction layer and the driver itself. I had noticed that for one of my projects queries were running quite slowly and comparing the query time direct into the database compared to that arriving through the interface it was notable that some queries were taking seconds rather than microseconds.

The new version of Psycopg comes with a reasonably elegant set of query primitives that work via context managers and also allows results to be returned in a dictionary format that is very easy to combine with NamedTuples which makes it quite easy to keep my repository code consistent with the existing application code while completely revamping the persistence layer. Currently I have replaced a lot of the inserts and selects but the partial updates are proving a bit trickier as dataset is a bit magical in the way it builds up the update code. I think my best option would be to try and create an SQL builder library or adapt something like PyPika which I’ve used in another of my projects.

One of the things that has surprised me in this effort is how much the official Python documentation does not appear in Google search results. Tutorial style content farms have started to dominate the first page of search results and you have to add a search term like “documentation” to surface it now. People have been complaining about Google’s losing battle with content farms but this is the first personal evidence I have of it. Although I always add “MDN” to my Javascript and CSS searches so maybe this is just the way of the world now, you have to know what the good sites are to find them…

Programming

London Django Meetup May 2023

Just one talk this time and it was more of a discussion of the cool things you can do with Postgres JSON fields. These are indeed very cool! Everything I wanted to do with NoSQL historically is now present in a relational database without compromise on performance or functionality, that is an amazing achievement by the Postgres team.

The one thing I did learn is that all the coercion and encoding information is held in the Django model and query logic which means you only have basic types in the column. I previously worked on a codebase that used SQLAlchemy and a custom encoder and decoder which split custom types into a string field with the Python type hint (e.g. Decimal, UUID) and the underlying value. By comparison with the Django implementation which appears to just use strings this is a leaky abstraction where the structure of the data is compromised by the type hint.

Using the Django approach would have been easier when using direct SQL on the database and followed the principle of least surprise.

The speaker was trying to make a case for performing aggregate calculations in the database but via the Django ORM query language which wasn’t entirely convincing. Perhaps if you have a small team but the resulting query language code was more complex that the underlying query and was quite linked to the Postgres implementation so it felt that maybe a view would have been a better approach unless you have very dynamic calculations that are only applied for a fixed timespan.

It was based on an experience report so it clearly worked for the implementing group but if felt like the approach strongly coupled the database, the web framework and the query language.

Programming

PyPika

PyPika is a DSL for creating SQL queries. It works by generating SQL from definitions that you create inline rather than by interpreting models or data classes or structures.

It makes it easy to define minimum projections and is also straight-forward to bind data to as you simply provide the values into the query that is generated rather than binding values to a query definition.

Once you have the query object constructed you call str on the object to obtain the actual SQL which you then need to pass to some execution context.

This also means that it is easy to test what SQL you are generating (although really using a DSL like this means you should really be trusting the library).

I have only had one real problem with the statements I have used to date and that is around the Postgres `RETURNING` statement that allows a generated key to be returned to the caller of an INSERT.

Apart from this using Pypika has been better than using a Python template or writing raw SQL.

Echo One

Sequentially arranged sentences composed of words (and punctuation)

Tag Archives: postgres