Programming

Don’t use Postgres Enums for domain concepts

If you’ve ever read a piece of advice about using Postgres Enums you’ve probably read not to use them and to use a peer table with foreign key constraints instead. If you’ve ever seen a Typescript codebase in the wild recently chances are that this advice has absolutely been ignored and enums are all over the place.

I’m not really sure why this should be when even Typescript discourages the use of enums itself. I think it a combination of a spurious sensation of type safety combined with a desire to think about a table with a simple column of constrained values that maps to an object with a limited set of typed constants. My main theory is that the issue is that values in a Check constraint are difficult to map into a Typescript ORM.

But to be clear there’s no ORM in any language that magically makes using enums painless. The issues stem from the Postgres implementation which is often just made worse by bad ORM magic trying to hide the problem through complex migrations.

Domain ideas change, enums shouldn’t

First of all I want to be clear that my intent isn’t to complain about Postgres enums. They have completely valid use cases and if you want to describe units of measure or ordinal weekday mappings then there’s nothing particularly wrong with them or their design. Anything that is a fundamental immutable concept is probably a great fit for the current enums model.

My issue is with mapping domain values onto these enums. We all know that business concepts and operations are subject to change, these ideas are far more malleable than the speed of light or the freezing point of water. Therefore they should be easy to change when our understanding of the domain changes.

And this is where the recommendation to use foreign keys instead of enums comes in. Changing a row in a table is lot easier than trying to migrate a enum. Changing a label, adding and removing rows, all of them become easier and follow existing patterns for managing relational data. You can also expose these relationships are a configuration layer without having to make changes to the database definition library.

Changing enums in Postgres

In Postgres you can expand an enum but if you want to delete or rename a value then you actually end up deleting the enum entirely and recreating it.

While you’re deleting it you have to really block all operations on anything that uses that enum, any mutation is going to be a nightmare to handle. For some organisations that don’t allow this kind of migration this would be a dealbreaker already.

And then you have value consistency, what do you do if you’re removing a value? Typically there is a bit of a dance where you create two enumerations, one representing the current valid values and another representing the future value set. You then edit the values of the column and swap over the types.

Overall the entire process feels crazy when you could just be editing parent rows like any other piece of relational data.

What about language enums?

I don’t feel that exercised about enums expressed in source code and used within a well-defined context. I’m not sure you want to try and persist them to the storage layer but if you do then having a well-defined process for doing so like a mapper or repository could do the heavy lifting of seeing whether the code values and the storage values are in sync.

If enums make your code easier to work with then that’s fine, let the interface deal with synchronisation and have a plan on how the code should work if it is out of sync with its external collaborators.

However if you are using Typescript then do look at the advice on using constant objects versus enums.

But please don’t encode a malleable domain concept in an immutable data storage implementation.

Standard
London, Work

Will humans still create software?

Recently I attended the London CTOs Unconference: an event where senior technical leaders discuss various topics of the day.

There were several proposed sessions on AI and its impact on software delivery and I was party of a group discussing the evolution of AI-assisted development and looking at the how this would change and ultimately what role people thought there would be for humans in software delivery (we put the impacts of artificial general intelligence to one side to focus on what happens with the technologies we currently have).

The session was conducted under Chatam House equivalent rules so I’m just going to record some of the discussion and key points in this post.

Session notes

Currently we are seeing automating of existing processes within the delivery lifecycle but there are opportunities to rethink how we deliver software that makes better use of the current generative AI tools and perhaps sets us up to take advantage of better options in future.

Rejecting a faster horse

Thinking about the delivery of the whole system rather than just modules of code, configuration or infrastructure allows us to think about setting a bigger task than simply augmenting a human. We can start to take business requirements in natural language, generate product requirement documents from these and then use formal methods to specify the behaviour of the system and verify that the resulting software that generative AI creates meets these requirements. Some members of the group had already been generating such systems and felt it was a more promising approach than automating different roles in the current delivery processes.

Although these methods have existed for a while they are not widely used currently and therefore it seems likely that reskilling will be required by senior technical leaders. Defining the technical outcomes they are looking for through more formal structures that work better with machines requires both knowledge and skill. Debugging is likely to move from the operation of code to the process of generation within the model and leading to an iterative cycle of refinement of both prompts and specifications. In doing this, the recent move to expose more chain of thought information to user is helpful and allows the user to refine their prompt when they can see flaws in the reasoning process of the model.

The fate of code

We discussed whether code would remain the main artefact of software production and we didn’t come to a definite conclusion. The existing state of the codebase can be given as a context to the generation process, potentially refined as an input in the same way as retrieval augmented generation works.

However if the construction of the codebase is fast and cheap then the value of code retention was not clear, particularly if requirements or specifications are changing and therefore an alternative solution might be better than the existing one.

People experimenting with whole solution generation do see major changes between iterations; for example where the model selects different dependencies. For things like UIs this matters in terms of UX but maybe it doesn’t matter so much for non-user facing things. If there is a choice of database mappers for example perhaps we only care that the performance is good and that SQL injection is not possible.

Specifications as artefacts

Specifications and requirements need to be versioned and change controlled exactly as source code is today. We need to ensure that requirements are consistent and coherent, which formal methods should provide, but analysis and resolution of differing viewpoints as to the way system works will remain an important technical skill.

Some participants felt that conflicting requirements would be inevitable and that it would be unclear as to how the generating code would respond to this. It is certainly clear that currently models do not seem to be able to identify the conflict and will most probably favour one of the requirements over the others. If the testing suite is independent than behavioural tests may reveal the resulting inconsistencies.

It was seen as important to control your own foundation model rather than using external services. Being able to keep the model consistent across builds and retain a working version of it decouples you from vendor dependencies and should be considered as part of the build and deployment infrastructure. Different models have different strengths (although some research contradicts this anecdotal observation). We didn’t discuss supplementation techniques but we did talk about priming the code generation process with coding standards or guidelines but this did not seem to be a technique currently in use.

For some participants using generative AI was synonymous with choosing a vendor but this is risky as one doesn’t control the lifespan of such API-based interactions or how a given model might be presented by the vendor. Having the skills to manage your own model is important.

In public repositories it has been noted that the volume of code produced has risen but quality has fallen and that there is definitely a trade-off being made between productivity and quality.

This might be different in private codebases where different techniques are used to ensure the quality of the output. People in the session trying these techniques say they are getting better results than what is observed in public reporting. Without any way of verifying this though people will just have to experiment for themselves and see if they can improve on the issues seen in the publicly observed code.

When will this happen in banks?

We talked a little bit about the rate of adoption of these ideas. Conservative organisations are unlikely to move on this until there is plenty of information available publicly. However if an automated whole system creation process works there are significant cost savings associated with it and projects that would previously have been ruled out as too costly become more viable.

What does cheap, quick codebases imply?

We may be able to retire older systems with a lot of embedded knowledge in them far more quickly than if humans had to analyse, extract and re-implement that embedded knowledge. It may even be possible to recreate a mainframe’s functionality on modern software and hardware and make it cheaper than training people in COBOL.

If system generation is cheap enough then we could also ask the process to create implementations with different constraints and compare different approaches to the same problem optimising for cost, efficiency or maintenance. We can write and throw away many times.

What about the humans?

The question of how humans are involved in the software delivery lifecycle, what they are doing and therefore what skills they need was unclear to us. However no-one felt that humans would have no role in software development but that it was likely to be different to the skill set the made people successful today.

It also seemed unlikely that a human would be “managing” a team of agents if the system of specification and constraints was adopted. Instead humans would be working at a higher level of abstraction with a suite of tools to deliver the implementation. Virtual junior developers seemed to belong to the faster horse school of thinking.

Wrapping up the session

The session lasted for pretty much the whole of the unconference and the topic often went broad which meant there were many threads and ideas that were not fully resolved. It was clear that there are currently at least two streams of experimentation: supplementing existing human roles in the software delivery cycle with AI assistance and reinventing the delivery process based on the possibilities offered by cheap large language models.

As we were looking to the future we most discussed the second option and this seems to be what people have in mind when they talk about not needing experienced software developers in future.

In some ways this is the technical architect’s dream in that you can start to work with pure expressions of solutions to problems that are faithfully adhered to by the implementer. However the solution designer now needs to understand how and why the solution generation process can go wrong and needs to verify the correctness and adherence of the final system. The non-deterministic nature of large language models are not going away and therefore solution designers need to think carefully about their invariants to ensure consistency and correctness.

There was a bit of an undertow in our discussions about whether it was a positive that a good specification almost leads to a single possible solution or whether we need to allow the AI to confound our expectation of the solution and created unexpected things that met our specifications.

The future could be a perfect worker realising the architect’s dream or it could be more like a partnership where the human is providing feedback on a range of potential solutions provided by a generation process, perhaps with automated benchmarking for both performance and cost.

It was a really interesting discussion and an interesting snapshot of what is happening in what feels like an area of frenzied activity from early adopters, later adopters probably can afford to give this area more time to mature as long as they keep an eye on whether the cost of delivery is genuinely dropping in the early production projects.

Standard
Programming

Type stripping in Node

I tried the new type stripping feature which is enabled by default in Node 23.6.0. Essentially the type information is stripped from the code file and things that require transformation are not supported by default yet. You’re essentially writing Typescript and running Javascript. This is essence of the Typescript promise that there’s nothing in the language that isn’t Javascript (ignore enums).

On the surface of it this might seem pointless but the genius of it is that all the source files are Typescript and all tooling works as normal. There’s just no complicated build or transformation step. I have some personal projects that were written as Javascript but which were using typing information via JSDoc. Type stripping is much less complicated and the type-checking and tooling integration is much richer.

I’m planning to try and convert my projects and see if I hit any issues but I feel that this should be the default way to use Typescript now and that it is probably worth looking to try and upgrade older projects. The gulf of functionality between Node 18 and 23 is massive now.

Standard
Month notes

November 2024 month notes

Rust tools

Rust seems to be becoming the defacto standard for tooling, regardless of the language being used at a domain level. This month I’ve talked to people from Deno who build their CLI with it, switched to the just command runner and ruff code formatter.

It’s an interesting trend in terms of both other languages being more comfortable about not writing their tooling in a different language and why Rust seems to have a strong showing in this area.

Gitlab pipelines

I have been working a lot with Gitlab CI/CD this month, my first real exposure to it. Some aspects are similar to Github Actions, you’re writing shell script in YAML and debugging is hard.

Some of the choices in the Gitlab job environments seems to make things harder than they need to be. By default the job checks out the commit hash of the push that triggered the build in a detached (fetch) mode. Depending on the natural of the commit (in a merge request, to a branch, to the default (main) branch) you seem to get different sets of environment variables populated. Choose the wrong type and things just don’t work, hurrah!

I’ve started using yq as tool for helping validate YAML files but I’m not sure if there is a better structural tool or linter for the specific Gitlab syntax.

Poetry

I’ve also being doing some work with Poetry. As everyone has said the resolution and download process is quite slow and there doesn’t seem to be a huge community around it is a tool. Its partial integration with pyproject.toml makes it feel more standard that it actually is with things under the Poetry key requiring a bit of fiddling to be accessible to other tools. Full integration with the later standard is expected in v2.

Nothing I’ve seen so far is convincing me that it can really make it in its current form. The fragmentation between the pure Python tools seems to have taken its toll and each one (I’ve typically used pipenv) has problems that they struggle to solve.

RSS Feeds

One of the best pieces of advice I was given about the Fediverse was that you need to keep following people until your timeline fills up with interesting things. I’ve been trying to apply that advice to programmers. Every time I read an interesting post I’m now trying to subscribe. Despite probably tripling the number of feeds I have subscribed to my unread view is improved but still dominated by “tech journalism”. I guess real developers probably don’t post that frequently.

Lobsters has been really useful for highlighting some really good writers.

CSS

Things continue to be exciting in the CSS world with more and more new modules entering into mainstream distribution (although only having three browsers in the world is probably helping). I had a little play around with Nested Selectors and while I don’t do lots of pseudo-selectors it is 100% a nice syntax for them. In terms of scoping rules, these actually seem a bit complex but at least they are providing some modularity. I think I’m going to need to play more to get an opinion.

The Chrome developer relations team have posted their review of 2024.

Not only is CSS improving that but Tailwind v4 is actually going to support (or improve support) some of these new features such as containers. And of course its underlying CSS tool is going to be Rust-powered, natch.

Standard
Month notes

October 2024 month notes

For small notes, links and thoughts see my Prose blog.

Web Components versus frameworks

Internet drama erupted over Web Components in what felt a needless way. Out of what often felt wasted effort there were some good insights Lea Verou had a good overview of the situation, along with an excellent line about standards work being “product work on hard mode”

Chris Ferdinandi had a good response talking about how web components and reactive frameworks can be used together in a way that emphasises their strengths.

One of my favourite takes on the situation was by Cory LaViska who pointed out that framework designers are perhaps not the best people to declare the future of the platform.

Web Components are a threat to the peaceful, proprietary way of life for frameworks that have amassed millions of users — the majority of web developers.

His call to iterate on the standard and try to have common parts to today’s competing implementations was echoed in Lea’s post.

The huge benefit of Web Components is interoperability: you write it once, it works forever, and you can use it with any framework (or none at all). It makes no sense to fragment efforts to reimplement e.g. tabs or a rating widget separately for each framework-specific silo, it is simply duplicated busywork.

The current Balkanisation of component frameworks is really annoying and it is developer’s fear and tribalism that has allowed it to happen and which has sustained it.

Postgres generated UUIDs

In my work I’ve often seen UUIDs be generated in the application layer and pushed into the database. I tried this in a hobby project this month and rapidly came to the conclusion that it is very tedious when you can just have the database handle it. In Postgres a generated UUID can just be the column default and I don’t think I’m going to do anything else in future if I have a choice about it.

Python 3.13

I’ve started converting my projects to the new Python version and it seems really fast and snappy even on projects that have a lazy container spin-up. I haven’t done any objective benchmarking but things just feel more responsive than 3.11.

I’m going to have a push to set this as the baseline for all my Python projects. For my Fly projects extracting out the Python version number as a Docker variable has meant migrating has been as simple as switching the version number so far.

For the local projects I’ve also been trying to use asdf for tool versioning more consistently and it has made upgrading easier where I’ve adopted it but it seems I have quite a few places where I still need to convert from either language specific tools or nothing.

uvx

uvx is part of the uv project and I started using it this month and its rapidly becoming my default way to run Python CLIs. The first thing I started using it with was pg-cli but I found myself using it to quickly run pytest over some quick scripting code I’d done as well as running ad-hoc formatters and tools. It’s quick and really handy.

There’s still the debate about whether the Python community should go all-in on uv, looking at the messy situation in Node where all manner of build and packaging tools could potentially be used (despite the ubiquity of npm) the argument for having a single way to package and run things is strong.

Standard
Month notes, Work

March 2024 month notes

Dependabot under the hood

I spent a lot more time this month than I was expecting with one of my favourite tools Github’s Dependabot. It started when I noticed that some of the projects were not getting security updates that others were. I know it possible for updates to be suspended on projects that neglect their updates for too long (I should really archive some of my old projects) but checking the project settings confirmed that everything was setup correctly and there was nothing that needed enabling.

Digging in I wondered how you are meant to view what Dependabot is doing, you might think it is implemented as an Action or something similar but in fact you access the information through the Insights tab.

Once I found it though I discovered that the jobs had indeed been failing silently (I’m still not sure if there’s a way to get alerted about this) because we had upgraded our Node version to 20 but had set the option engine-strict on. It turns out that Dependabot runs on its own images and those were running Node 18. It may seem tempting to insist that your CI uses the same version as your production app but in the case of CI actions there’s no need to be that strict, after all they are just performing actions in your repository management that aren’t going to hit your build chain directly.

Some old dependencies also caused problems in trying to reconcile their target version, the package.json Node engine and the runtime Node version. Fortunately these just highlighted some dependency cruft and depreciated projects that we just needed to cut out of the project.

It took a surprising amount of time to work through the emergent issues but it was gratifying to see the dependency bundles flowing again.

Rust

I started doing the Rustlings tutorial again after maybe a year in which I’d forgotten about it (having spent more time with Typescript recently). This is a brilliant structured tutorial of bite-sized introductions to various Rust concepts. Rust isn’t that complicated as a language (apart from its memory management) but I’ve found the need to have everything right for the code to compile means that you tend to need to devote dedicated time to learning it and it is easy to hit some hard walls that can be discouraging.

Rustlings allows you to focus on just one concept and scaffolds all the rest of the code for you so you’re not battling a general lack of understanding of the language structure and just focus on one thing like data structures or library code.

Replacing JSX

Whatever the merits of JSX it introduces a lot of complexity and magic into your frontend tooling and I’ve seen a lot of recommendations that it simply isn’t necessary with the availability of tagged string literals. I came back to an old Preact project this month that I had built with Parcel. The installation had a load of associated security alerts so on whim I tried it with ViteJS which mostly worked except for the JSX compilation.

Sensing a yak to shave I started to look at adding in the required JSX plugin but then decided to see if I really needed it. The Preact website mentioned htm as an alternative that had no dependencies. It took me a few hours to understand and convert my code and I can’t help but feel that eliminating a dependency like this is probably just generally a good idea.

The weirdest thing about htm is how faithful it is to the JSX structure, I was expecting something a bit more, well, HTML-ly but props and components pretty much work exactly how they do in JSX.

Postgres news

A Postgres contributer found a backdoor into SSH that required an extensive amount of social engineering to achieve. If you read his analysis of how he discovered it then it seems improbable that it would have been discovered. Some people have said this is a counterpoint to “many eyes make bugs shallow” but the really problem seems to be how we should be maintaining mature opensource projects that are essentially “done” and just need care and oversight rather than investment. Without wanting to centralise open source it feels like foundations actually do a good job here by allowing these kind of projects to be brought together and have consistent oversight and change management applied to them.

I read the announcement of pgroll which claims to distil best practice for Postgres migrations regarding locks, interim compatibility and continuous deployment. That all sounds great but the custom definition format made me feel that I wanted to understand it a little better and as above, who is going to maintain this if it is a single company’s tool?

Postgres was also compiled into WASM and made available as an in-memory database in the browser, which feels a bit crazy but is also awesome for things like testing. It is also a reminder of how Web Assembly opens up the horizons of what browsers can do.

Hamstack

Another year, another stack. I felt Hamstack was tongue in check but the rediscovery of hypermedia does feel real. There’s always going to be a wedge of React developers, just like there will be Spring developers, Angular developers or anything else that had a hot moment at some point in tech history. However it feels like there is more space to explore web native solutions now than there was in the late 2010s.

This article also introduced me to the delightful term “modulith” which perfects describes the pattern that I think most software teams should follow until the hit the problems that lead to other solution designs.

Standard
Work

January 2024 month notes

Water CSS

I started giving this minimal element template a go after years of using various versions of Bootstrap. It is substantially lighter in terms of the components it offers with probably the navigation bar being the one component that I definitely miss. The basic forms and typography are proving fine for prototyping basic applications though.

Node test runner

Node now has a default test runner and testing framework. I’ve been eager to give it a go as I’ve heard that it is both fast and lightweight, avoiding the need to select and include libraries for testing, mocking and assertions. I got the chance to introduce it in a project that didn’t have any tests and I thought it was pretty good although it’s default text output felt a little unusual and the alternative dot notation might be a bit more familiar.

It’s interesting to see that the basic unit of testing is the assertion, something is shares with Go. It also doesn’t support parameterised tests which again is like Go which has a pattern of table-driven tests implemented with for loops except that Go allows more control of the dynamic test case naming.

I’d previously moved to the Ava library and I’m not sure there is a good reason not to use the built-in alternative.

Flask blueprints

In my personal projects I’ve tended to use quite a few cut and paste modules and over the years they tend to drift and get out of sync so I’ve been making a conscious effort to learn about and start adopting Flask Blueprints. Ultimately I want to try and turn these into personal module dependencies that I can update once and use in all the projects. For the moment though it is interesting how the blueprints format is pushing me to do some things like logging better (to understand what is happening in the blueprint) and also structuring the different areas of the application so that they are quite close to Django apps with various pieces of functionality now starting to be associated with a url prefix that makes it a bit easier to create middleware that is registered as part of the Blueprint rather than relying on imports and decorators.

Web components

I’ve been making a bit of progress with learning about web components. I realised that I was trying to do too much initially which is why they were proving complicated. Breaking things down a bit has helped with an initial focus on event listeners within the component. I’m also not bringing in external libraries at the moment but have got as far as breaking things up into [ESM modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules) which has mostly worked out so far.

Standard
Programming

Enterprise programming 2023 edition

Back in the Naughties there were Enterprise Java Beans, Java Server Pages, Enterprise Edition clustered servers, Oracle databases and shortly thereafter the Spring framework with its dependency injection wiring. It was all complicated, expensive and to be honest not much fun to work with. One of the appeals of Ruby on Rails was that you could just get on an start writing a web application rather than staring at initialisation messages.

During this period I feel there was a big gap between the code that you wrote for living and the code you wrote for fun. Even if you were writing on the JVM you might be fooling around with Jython or Groovy rather than a full Enterprise Java Bean. After this period, and in particular post-Spring in everything, I feel the gap between hobby and work languages collapsed. Python, Ruby, Scala, Clojure, all of these languages were fun and were equally applicable to work and small-scale home projects. Then with Node gaining traction in the server space then gap between the two worlds collapsed pretty dramatically. There was a spectrum that started with an inline script in a HTML page that ran through to a server-side API framework with pretty good performance characteristics.

Recently though I feel the pendulum has been swinging back towards a more enterprisey setup that doesn’t have a lot of appeal for small project work. It often feels that a software delivery team can’t even begin to create a web application without deploying on a Kubernates cluster with Go microservices being orchestrated with a self-signing certificate system and a log shipping system with Prometheus and Grafana on top.

On the frontend we need an automatically finger-printing statically deployed React single-page app, ideally with some kind of complex state management system like sagas or maybe everything written using time-travelable reactive streams.

Of course on top of that we’ll need a design system with every component described in Storybook and using a modular class-based CSS system like Tailwind or otherwise a heavyweight styled component library based on Material design. Bonus points for adding React Native into this and a CI/CD system that is ideally mixes a task server with a small but passionate community with a home-grown pipeline system. We should also probably use a generic build tool like Bazel.

And naturally our laptop of choice will be Apple’s OSX with a dependency on XCode and Homebrew. We may use Github but we’ll probably use a monorepo along with a tool to make it workable like Lerna.

All of this isn’t much fun to work on unless you’re being paid for it and it is a lot of effort that only really pays off if you hit the growth jackpot. Most of the time this massive investment in complex development procedures and tooling simply throws grit into the gears of producing software.

I hope that soon the wheel turns again and a new generation of simplicity is discovered and adopted and that working on software commercially can be fun again.

Standard
Programming, Python

Transcribing podcasts with Google’s Speech to Text API

I don’t really listen to podcasts, even now when I have quite a long commute. I generally read faster than I can listen and prefer to read through transcripts than listen, even when the playback speed is increased. Some shows have transcripts and generally I skim read those when available to see if it would be worth listening to segments of the podcasts. But what about the podcasts without transcripts? Well Google has a handy Speech to Text API so why not turn the audio into a text file and then turn it into a HTML format I can read on the phone on the tube?

tldr; the API is pretty much the same one as generates the Youtube automatic subtitling and transcripts. It can just about create something that is understandable as a human but its translation of vernacular voices is awful. If Youtube transcripts don’t work for you then this isn’t a route worth pursuing.

Streaming pods

I’m not very familiar with Google Cloud Services, I used to do a lot of App Engine development but that way of working was phased out in favour of something a bit more enterprise friendly. I have the feeling that Google Cloud’s biggest consumers are data science and analysis teams and the control systems intersect with Google Workspace which probably makes administration easier in organisations but less so for individual developers.

So I set up a new project, enabled billing, associated the billing account with a service account, associated the service account with the project and wished I’d read the documentation to know what I should have been doing. And after all that I created a bucket to hold my target files in.

You can use the API to transcribe local audio files but only if they are less than 60 seconds long. I needed to be using the long running asynchronous invocation version of the API. I also should have realised that I need to write the transcription to a bucket too, I ended up using the input file name with “.json” attached but until I started doing that I didn’t realise that my transcription was failing to recognise my input.

Learning the API

One really nice feature Google Cloud has is the ability to run guided tutorials in your account via CloudShell. You get a step by step guide that can simply paste the relevant commands to your shell. Authorising the shell to access the various services was also easier than generating credentials locally for what I wanted to do.

Within 10 minutes I had processed my first piece of audio and had a basic Python file setup. However the test file was in quite an unusual format and the example was the synchronous version of the API.

I downloaded a copy of the Gettysburg address and switched the API version but then had my CloudShell script await the outcome of the transcoding.

Can you transcribe MP3?

The documentation said yes (given a specific version) and while the client code accepted the encoding type, I never got MP3 to work and instead I ended up using ffmpeg to create FLAC copies of my MP3 files. I might have been doing something wrong but I’m not clear what it was and the job was accepted but it was returning an empty JSON object (this is where creating files for the output is much more useful that trying to print an empty response).

FLAC worked fine and the transcript seemed pretty on the money and converting the files didn’t seem that much of a big deal. I could maybe do an automatic conversion later when the file hit the bucket if I needed to.

However after my initial small files I found that waiting for the result of the API call resulted in hitting a timeout on the execution duration within the shell. I’ve hit something like this before when running scripts over Google Drive that copied directories. I didn’t have a smart solution then (I just skipped files that already existed and re-run the jobs a lot) and I didn’t have one now.

Despite the interactive session timing out the job completed fine and the file appeared in the storage bucket. Presumably this would have been where it would have been easier to be running the script locally or on some kind of temporary VM. Or perhaps I should have been able to get the run identifier and just have checked the job using that. The whole asynchronous execution of jobs in Google Cloud is another area where what you are meant to do is unclear to me and working on this problem didn’t require me to resolve my confusion.

Real audio is bobbins

So armed with a script that had successfully rendered the Gettysburg address I switched the language code to British English, converted my first podcast file to FLAC and set the conversion running.

The output is pretty hilarious and while you can follow what was probably being said it feels like reading a phonetic version of Elizabethan English. I hadn’t listened to this particular episode (because I really don’t listen to podcasts, even when I’m experimenting on them) but I did know that the presenters are excessively Northern and therefore when I read the text “we talk Bob” I realised that it probably meant “we are talking bobbins”. Other gems: “threw” had been rendered as “flu” and “loathsome” as “lord some”. Phonentically if you know the accent you can get the sense of what was being talked about and the more mundane the speech the better the transcription was. However it was in no way an easy read.

I realised that I was probably overly ambitious going from a US thespian performing a classic of political speechwriting to colloquial Northern and London voices. So next I chose a US episode, more or less the first thing I could get an MP3 download of (loads of the shows are actually shared on services that don’t allow you access to the raw material).

This was even worse because I lacked the cultural context but even if I had, I have no idea how to interpret “what I’m doing ceiling is yucky okay so are energy low-energy hi”.

The US transcript was even worse than the British one, partly I think because the show I had chosen seems to have the presenters talking over one another or speaking back and forth very rapidly. One of them also seems to repeat himself when losing his chain of thought or wanting to emphasise something.

My next thought was to try and find a NPR style podcast with a single professional presenter but at this point I was losing interest. The technology was driving what content I was considering rather than bringing the content I wanted to engage with to a different medium.

You Tube audio

If you’ve ever switched on automatic captioning in Youtube then you’ve actually seen this API in action, the text and timestamps in the JSON output are pretty much the same as what you see in both the text transcript and the in-video captioning. My experience is that the captioning is handy in conjunction with the audio but if I was fully deaf I’m not sure I would understand much about what was going on in the video from the auto-generated captions.

Similarly here, the more you understand the podcast you want to transcribe the more legible the transcription is. For producing a readable text that would reasonably represent the content of the podcasts at a skim reading level the technology doesn’t work yet. The unnatural construction of the text means you have to quite actively read it and put together the meaning yourself.

I had a follow-up idea of using speech to text and then automated translation to be able to read podcasts in other languages but that is obviously a non-starter as the native language context is vital for understanding the transcript.

Overall then a noble failure; given certain kinds of content you can actually create pretty good text transcriptions but as a way of keeping tabs on informal, casual audio material, particularly with multiple participants this doesn’t work.

Costs

I managed to blow through a whole £7 for this experiment which actually seemed like a lot for two podcasts of less than an hour and a seven minute piece of audio. In absolute terms though it is less than proverbial avocado on toast.

Future exploration

Meeting transcription technology is meant to be pretty effective including identifying multiple participants. I haven’t personally used any and most of the services I looked at seemed aimed at business and enterprise use and didn’t seem very pay as you go. These however might be a more viable path as there is clearly a level of specialisation that is needed on top of the off-the-shelf solutions to get workable text.

Links

Standard
Programming, Work

August 2023 month notes

I have been doing a GraphQL course that is driven by email. I can definitely see the joy of having autocompletion on the types and fields of the API interface. GraphQL seems to have been deployed way beyond its initial use case and it will be interesting to see if its a golden hammer or genuinely works better than REST-based services outside the abstraction to frontend service. It is definitely a complete pain in the ass compared to HTTP/JSON for hobby projects as having to ship a query executor and client is just way too much effort compared to REST and more again against maybe not doing a Javascript app interface.

I quite enjoyed the course, and would recommend it, but it mostly covered creating queries so I’ll probably need to implement my own service to understand how to bind data to the query language. I will also admit that while it is meant to be quite easy to do each day I ended up falling behind and then going through half of it on the weekend.

Hashicorp’s decision to change the license on Terraform has caused a lot of anguish on my social feeds. The OpenTerraform group has already announced that they will be creating a fork and are also promising to have more maintainers than Hashicorp. To some extent the whole controversy seems like a parade of bastards and it is hard to choose anyone as being in the right but it makes most sense to use the most open execution of the platform (see also Docker and Podman).

In the past I’ve used CloudFormation and Terraform, if I was just using AWS I would probably be feeling smug with the security of my vendor lock-in but Terraform’s extensibility via its provider mechanisms meant you could control a lot of services via the same configuration language. My current work uses it inconsistently which is probably the worst of all worlds but for the most part it is the standard for configuring services and does have some automation around it’s application. Probably the biggest advantage of Terraform was to people switching clouds (like myself) as you don’t have to learn a completely new configuration process, just the differences with the provider and the format of the stanzas.

The discussion of the change made we wonder if I should look at Pulumi again as one of the least attractive things about Terraform is its bizarre status as not quite a programming language, not quite Go and not quite a declarative configuration. I also found out about Digger which is attempting to avoid having two CI infrastructures for infrastructure changes. I’ve only ever seen Atlantis used for this so I’m curious to find out more (although it is such an enterprise level thing I’m not sure I’ll do much than have an opinion for a while).

I also spent some time this month moving my hobby projects from Dataset to using basic Pyscopg. I’ve generally loved using Dataset as it hides away the details of persistence in favour of passing dictionaries around. However it is a layer over SQLAlchemy which is itself going through some major point revisions so the library in its current form is stuck with older versions of both the data interaction layer and the driver itself. I had noticed that for one of my projects queries were running quite slowly and comparing the query time direct into the database compared to that arriving through the interface it was notable that some queries were taking seconds rather than microseconds.

The new version of Psycopg comes with a reasonably elegant set of query primitives that work via context managers and also allows results to be returned in a dictionary format that is very easy to combine with NamedTuples which makes it quite easy to keep my repository code consistent with the existing application code while completely revamping the persistence layer. Currently I have replaced a lot of the inserts and selects but the partial updates are proving a bit trickier as dataset is a bit magical in the way it builds up the update code. I think my best option would be to try and create an SQL builder library or adapt something like PyPika which I’ve used in another of my projects.

One of the things that has surprised me in this effort is how much the official Python documentation does not appear in Google search results. Tutorial style content farms have started to dominate the first page of search results and you have to add a search term like “documentation” to surface it now. People have been complaining about Google’s losing battle with content farms but this is the first personal evidence I have of it. Although I always add “MDN” to my Javascript and CSS searches so maybe this is just the way of the world now, you have to know what the good sites are to find them…

Standard