Programming

What I learnt publishing an Eleventy site to Github Pages

I have an Eleventy site in a Github repo and I want to publish it, what could be more logical or easier than pushing it to Github Pages?

Well, the process was relatively easy but there was still enough gotchas to make it worth recording the lessons learned. First off the Eleventy documentation for Github Pages is not great, I ended up using the Starter Template for Jekyll combined with the Vite documentation but I think a few of my problems came from mashing up my sources.

First things first, you have to manually set your repo’s project Pages setting to Github Actions for anything to happen. I thought the Github Actions could somehow set this up via the configure Pages action but it is a cart before horse situation.

I had quite a few obscure YAML parsing errors and you don’t get any more detail back than “your file is wrong”. I found the action linter invaluable but I also faced a problem that my syntax problem was being reported on the job name but the actual problem was further down from the reported line. Cutting and pasting segments into the linter eventually allowed me to track down the problematic statement and get a parsed file.

Permissions on a job are not additive to the base permissions but override them. I thought I was adding a permission at the job level but I was in fact resetting them.

Having the permissions wrong resulted in an obscure error message Unable to get ACTIONS_ID_TOKEN_REQUEST_URL env variable which is the authorisation endpoint for an API action. The reason it is unavailable is that the permission isn’t set for the execution context.

Github Pages will publish to <Github username>.github.io/<repo name> which means that by default all the Eleventy generated links will be wrong. You need to use the Base plugin (confusing named to suggest an affinity to the base element) but totally different.

Concurrency was interesting and people seem to do a lot of different things. My conclusion is that the deployment job should only have one instance but shouldn’t be cancelled. If you have a build job that can be split by branch and should be cancelled if a new job is triggered.

Having different concurrency rules seems to be a big reason for splitting up the build and deployment activities, otherwise just having one end to end job seems easier to work with.

I should probably have gone with a starter template (you get offered them when you switch on Github Actions for a repository), unless you’re using a specific tool the static template seems the best. From here you just need to replace the artifact upload action with a build and upload step.

Standard
Python

Django October 2024 Hackfest

This session was a little more informal than I thought it was going to be but it wasn’t time wasted as it provided an incentive to switch some of projects over to Python 3.13 (which was a great idea so far by the way).

As part of the suggested activities at the session I tried testing a Django template formatting tool called dJade (pronounced just Jade) (introductory post). It worked and seemed pretty good to me, although I don’t really have any complicated projects to work and had to use some off the internet for the testing.

I used uvx to run the formatter and felt that there was something strange going on when I’m running a Rust tool to run a Rust tool and the only Python element was a Pypi listing and the fact that it formats Python code.

The suggestions also included helping out on Narwhals, which I hadn’t heard of before but aims to be a compatibility layer between different dataframe implementations. It seemed an interesting project but not one I have the right background to help with.

Standard
Month notes

August 2024 month notes

Co-pilot

Ever late to the party I’ve finally been using AI assisted coding on a work project. It’s been a really interesting experience, sometimes helpful and sometimes maddening.

Among the positives are that it was easy to get the LLM to translate between different number systems like rgb and hex or pixels, rems and Tailwind units.

It was pretty good at organising code according to simple rules like lexical sorting but it was defeated by organising imports according to linting rules. This makes it a great tool for organising crufty code that hasn’t been cared for in a while and has often been more powerful than pure AST-based refactoring.

At one point it correctly auto-populated stub airport code data into a test data structure which felt that something I hadn’t seen in assistance before.

It also helped my write a bash script in a fraction of the time it would normally take. The interesting thing here was that I know a reasonable amount of bash but can never remember the proper bracketing and spacing. Although I tweaked every line that was produced it was much quicker than Googling the correct syntax or running and repeating.

What wasn’t so great was that the interaction between the Co-pilot and Intellisense suggestions aren’t really differentiated in the UI so it was really unclear what completions are the result of reflection or inference from the code and which ones are based on probability. If you’re having a field name suggested then that should only be via reflection in my view. All too often the completion resulted in an immediate check error due to the field having a slightly different name or not existing at all.

I’m almost at the point of switching off Co-pilot suggestions because they aren’t accurate enough right now.

Would I pay for this myself right now? No, I don’t think this iteration has the right UX and ability to understand the context of the code. However there will be a price point that is right in the future for things like the script writing.

Atuin

I started a new job recently and probably the most useful tool I’ve used since starting is Atuin which gives you a searchable shell history. I’ll probably write up more about my new shell setup but I think being able to pull back commands quickly has made it massively easier to cope with a new workflow and associated commands and tools.

Form Data

This little web standards built-in was the best thing to happen to my hobby coding this month. I can’t believe I’ve gone this long without having ever used it. You can pass it a DOM reference and access the contents of the form programmatically or you can construct and instance and pass it along to a fetch call.

It’s incredibly useful and great for using in small frontends.

Reading list

Gotchas in using SQLite in production: https://blog.pecar.me/sqlite-prod

Practical SVG has been published for free on the internet after publisher A Book Apart stopped distributing its catalogue.

Let’s bring about the end of countless hand-rolled debounce functions: https://github.com/whatwg/dom/issues/1298

Python packaging tool uv had a major release this month. Simon Willison shared a number of interesting observations over at the Lobsters thread on the release. I’m still uncertain about the wisdom of trying to fund developer tooling with venture capital, I don’t believe the returns are there, however I did come round to people’s arguments that the tools could be brought into community stewardship if needed. Thinking of recent licensing forks the argument seems persuasive.

I currently happily mimbling along with pipenv but I need to update some hobby apps to Python 3.12/3.13 soon so I think I’m going to give uv a go and see what happens.

I also started a small posts blog this month so I’m probably going to post these items there in the future.

Standard
Blogging

September 2024 month notes

Unfortunately I’ve been ill with acute bronchitis this month which meant not being able to attend a few events I’d been looking forward to and not sleeping very well which meant not much time to do anything that wasn’t essentially.

I have some posts I’m working on around class-based CSS systems and working with datamapper solutions that allow you to work with SQL directly compared to ORM abstractions but they aren’t ready as yet.

I did attend the CTO Unconference which was another good event but with less actionable ideas than previous sessions. There were a few conversations about good development processes and what core things a technical solution should provide. We didn’t really get into costs and focus but my view was that people generally didn’t think in terms of cost-benefit and that there is a lot of zero-interest-rate legacy that we are now going to have to deal with in a very different financial era. Simplicity and a minimum of moving parts seems important.

I’m a big fan of developing in the open and I think the organisations I’ve worked for that adopted this practice were the best places I’ve worked at for communication and design. Ross Ferguson (a former colleague) is compiling a list of public organisations that use open roadmaps which are a really interesting idea (if probably a bit scary for organisations that don’t have disclosure or transparency requirements)

Gitlab does open support tickets (Background jobs failing, Background job results not being available) which was something brand new to me. Again it seems quite scary but on seeing it working it gave me a tremendous amount of insight into the problems and what the Gitlab folks were trying to do about it. Something that no amount of strongly worded emails to support addresses or account managers has ever gotten me.

Standard
Month notes

July 2024 month notes

Dockerising Python

Fly have changed their default application support to avoid buildpacks and provide a default Dockerfile when starting new projects. I’ve been meaning to upgrade my projects to Python 3.12 as well and when one of my buildpack projects stopped deploying I ended up spending some time on how to best package Python applications for a PaaS deployment.

I read about which distribution to use as your base image but I haven’t personally encountered those problems and my image sizes are definitely smaller with Alpine.

Docker’s official documentation is a nightmare with no two Dockerfiles being consistent in approach. This page has some commented example files under the manual tabs but there doesn’t seem to be an easy way to generate a direct link to it which seems, actually typical of my documentation experience.

There also doesn’t seem to be a consistent view as to whether an application should use the system Python or a virtual environment within the container. The latter seems more logical to me and is what I was doing previously but the default Fly configuration isn’t set up that way.

Services

I have quite a few single user hobby web projects and I’ve been wondering if they wouldn’t work a lot better with a local SQLite datastore but it is actually often easier to use a cloud Postgres service than it is have a secure read-write directory available to an app and manage backups and so on yourself.

Turso is taking this idea one step further to try and solve the multi-tenancy issue by providing every client with a lightweight database.

I gave Proton Docs a whirl this month and they are pretty usable with the caveat that I haven’t tried sharing and collaboratively editing them yet. The one thing that is missing for me at the moment is keyboard shortcuts which seem pretty necessary when you’re typing.

I had previously tried de-Googling with Cryptpad which is reasonable for spreadsheet but has a really clunky document interface compared to Google Docs and which I ended up using more out of principle than because it was an equivalent product.

Reading list

It’s possible to get hung up on what good image description looks but this WAI guide to writing alt text for images is straight-forward and breaks down the most common cases with examples.

Smolweb is a manifesto for a smaller, lighter web which aligns for me with the Sustainable Web initiatives. There are a few interesting ideas in the manifesto such as using a Content Security Policy to stop you from including content from other sites (such as CDNs).

Following up on this theme is a W3 standard for an Ethical Web which also felt very inspiring. Or maybe depressing that some of these things need to be formulated in a common set of principles.

I also found out about the hobby Spartan protocol this month which seems like it would be a fun thing to implement and is closer to the original HTTP spec which was reasonable easy for people to follow and implement.

Standard
Web Applications

The changing landscape of UK Energy

In the last year I’ve been building up a list of websites that help understand how electrical energy is produced in the UK and how it feeds into the grid. Building this understanding seems to be a vital requirement to understand the nature of the investment we need to make in the UK’s energy infrastructure and also massive potential that we are still failing to tap into.

But the other thing I’ve learned is that a lot of ideas that I grew up with around energy are probably no longer true. In particular the nature of solar energy, which while quiet and passive is steadily becoming a key part of the country’s energy infrastructure. This means that often there is more cheap renewable electricity in the middle of the day so it makes sense to run things like washing machines in the afternoon. This is a totally different paradigm from the one I grew up with where the cheapest costs were always at night when demand was lowest.

The demand curve is still true but I think this now illustrates the problem of storage and release. If wind energy is available all through the night when demand is low we need to be able to store this more effectively than we do now (if we store it at all, which is something I’m still trying to understand).

I’m really grateful to the creators of the following tools for their efforts in creating such helpful visualisations and utilities and for the creation of the underlying APIs that allow such projects to exist.

Standard
Software

OpenUK: What the fork do we do now?

I attended an excellent talk organised by OpenUK about open-source forks recently.

Dawn Foster gave the context about why forks happen and a few historical examples of when forks overtake their originating projects and why that sometimes doesn’t happen and a few occasions when both the original project and the fork end up in different niches.

This was all given life by James Humphries’ talk about OpenTofu and the problems the consortium of former competitors have had getting their project off the ground. One immediate lesson learned was not to just take the head of the forked project but look for a stable milestone to branch from so you don’t immediately inherit someone else’s work in progress.

Another interesting observation was that commits direct to main without the context of a PR were often harder to understand. Change control processes attract a lot of passion but it was interesting that fast moving projects with direct to main changes are harder for newcomers to understand and maintain.

One problem I found particularly interesting was the change in licensing on Terraform’s registry project which meant the fork had to construct an alternative registry very quickly. They had advice from several other projects including Homebrew and were able to quickly bring up a registry that could be community-maintained and with a contribution of network costs from Cloudflare very low-cost.

Hashicorp clearly changed the terms to not allow competitors to use their registry and maybe that’s valid but it demonstrated that it is the ecosystem that makes forks difficult rather the actual code of the tool.

The project removed the tool’s telemetry early on to help respect user privacy but then then that makes it harder to explain whether their fork is finding traction or not. Looking at the traffic into the registry is a proxy for the volume of usage. This balance between privacy and metrics is an interesting one in open projects.

Standard
Month notes

June 2024 month notes

Meetups

I went to the Django monthly meeting (which clashed with the first England football match and the Scala meetup) where my former colleague Leo Giordani talked about writing his own markup language Mau for which he’d even hand-rolled his own parser so that he could switch lexing modes between block and inline elements.

Browsers

The Ladybird browser got a non-profit organisation to support its development and the discussion about it reminded me that the servo project also exists.

In the past we’ve understood that it is important to have a choice of different implementations to select from for browsers, so I think it is good to have this community based browsers to compliment the commercial and foundation backed browsers.

I also used Lynx for the first time in many years as I wanted to test a redirect issue with a site and it is still probably the easiest way to check if public facing sites are routing as they should.

Alternative search engines

I started giving Perplexity a go this month after seeing it recommended by Seth Godin. That was before the row with content creators kicked off in earnest. I’m going to let that settle out before continuing to explore it.

I was using it not for straight queries but instead to ask for alternatives to various products or methods. It successfully understood what I was talking about and did I successfully offer alternatives along with some pros and cons (which to be honest felt quite close to to the original material rather than being a synthesis). Queries that benefit from synthesis is definitely one area where LLM-based queries are better than conventional searching by topics.

I’ve also tried this on Gemini but the answers didn’t feel as good as the referenced sources were not as helpful. I would have thought the Google offering would have been better at this but having said that a lot of the Google first page search widgets and answer summary are often not great either.

CSS Units

I learnt about the ex CSS unit this month as well as some interesting facts about how em is actually calculated. I might take up the article’s suggestion of using it for line-height in future.

The calculation of em seems to be the root cause for the problems leading to this recommendation to use rem for line width rather than ch (I’ve started use using ch after reading Every Layout but I don’t use a strict width for my own projects judging myself what feels appropriate).

The environmental impact of LLMs

Both Google and Microsoft (Register’s article, Guardian article) announced that they have massively increased their emissions as a result of increased usage and training of AI models.

The competition to demonstrate that a company has a leading model is intense and there is a lot of money being driven through venture capital and share prices that provides the incentive. This profligacy of energy doesn’t feel like a great use of resources though.

I’ve also read that Google has relied on being offsets rather than switching to genuinely sustainable fossil-fuel-free energy. Which if true is completely mad.

Reading list

I learnt this month that Javascript has an Atomics package which is quite intriguing as I think Atomics are some of the easiest concurrency elements to work with. The Javascript version is quite specific and limited (it works only with ArrayBuffers) but it had completely passed me by.

I also really enjoyed reading through bits of this series on writing minimal Django projects which really helps explain how the framework works and how the bits hang together.

Standard
Month notes

May 2024 month notes

Updating CSS

My muscle memory on CSS is full of left and right, top and bottom. The newer attributes of -inline and -block use start and end qualifiers to avoid confusion with right to left languages. This month I made an effort to try and convert my older hobby code over to the new format to try and get the new names ingrained in my memory.

Another example of things in web development that have now to be unlearnt is that target="blank" is now safe by default. This used to be something that used to be drilled into web developers..

Learning with LLMs

I had my first positive experience using a LLM-based model to learn to code something this month. It was an interesting set of circumstances that led to it really working for me where it hadn’t before.

  • I didn’t know much about the topic, therefore I didn’t know how to formulate search queries that gave me good results
  • The official documentation was complete but poorly written and organised, exploring text can be the perfect task for an LLM
  • Information was scattered over several sites, including Medium. There wasn’t one article or site that really had a definitive answer so synthesising across several sources really helped. I wanted the text of the official documentation combined with the working code from a real person’s blog post.

I used a couple of different systems but Codemate was the most helpful follow by Google’s Gemini.

Previously I’ve been searching for information that I know quite well and therefore instead of getting a lot of value from the information compared to any hallucinated misses the mistakes were irritating me. Summarising data from multiple sources is genuinely an LLM superpower so this consolidation of several not great sources was probably right in its sweet spot.

URL exploring and saving

I needed to build up some queries on a system’s API this month. I decided to give Slumber a go after trying some local Postman-style clones.

The tool is a TUI and uses a YAML file as its store and dynamically syncs the UI when the file is saved. There were a couple of issues; for example it would be helpful to be able to save the content of a response to file and if something is marked sensitive (like the bearer token) then I would prefer to see it masked in the UI.

Overall though I got what I needed to done and the system was a lot easier than most web-based GUI tools that I’ve used as the underlying storage and its relation to the interface is really clear.

Also a shout out to chains, initially these seemed to be an example of making simple things complicated but as I understood them more then they are amazingly powerful for coordinating setups for calls.

Community events

I went to the May Day Data Science event for the first time. It seems the best talks were in rooms that had the least capacity and there was a strict no standing rule. Despite this I did pick up some useful bits and pieces, in particular around prompt design.

I also went to the Django Meetup held at the Kraken offices and was really struck by what a great engineering team they have built up there. Dave Seddon gave a great introduction to the “native library escape hatch” that exists in Python. This time showing how to bring in Rust code to help execution time.

I also went to the Python Meetup this month and spent a day in Milton Keynes at the Juxt 24 conference which had a lot of interesting talks and where I could have spent a lot more time at the afterparty.

Standard
Python

London Python meetup May 2024

The meetup was held at Microsoft’s Reactor offices near Paddington which have a great view down the canal towards Maida Vale. Attendees got an email with a QR code to get in through the gate which all felt very high-tech.

The first talk was not particularly Python related but was an introduction to vector databases. These are having a hot moment due to the way that machine learning categorisation maps easily into flat vectors that can then be stored and compared through vector stores.

Then can then be used to complement LLMs through the Retrieval Augmented Generation (RAG) which combines the LLM’s ability to synthesis and summarise content with more conventional search index information.

It was fine as it went and helped demystify the way that RAG works but probably this langchain tutorial is just as helpful as to the practical application.

The second talk was about langchain but was from a Microsoft employee who was demonstrating how to use Bing as a agent augmentation in the Azure hosted environment. It was practical but the agent clearly spun out of control in the demo and while the output was in the right ballpark I think it illustrated the trickiness of getting these things to work reliably and to generate reliable output when the whole process is essentially random and different each run.

It was a good shop window into the hosted langchain offering but could have done more to explore the agent definition.

The final talk was by Nathan Matthews CTO of Retrace Software. Retrace allows you to capture replay logs from production and then reproduce issues in other environments. Sadly there wasn’t a demo but it is due to be released as open source soon. The talk went through some of the approaches that had been taken to get to the release. Apparently there is a “goldilocks zone” for data capture that avoids excessive log size and performance overhead. This occurs at the library interface level with a proxy capture system for C integration (and presumably all native integration). Not only is lower level capture chatty but capturing events at a higher-level of abstraction makes the replay process more robust and easier to interact with.

The idea is that you can take the replay of an issue or event in production, replay it on a controlled environment with a debugger attached to try and find out the cause of the issue without ever having to go onto a production environment. Data masking for sensitive data is promised which then means that the replay logs can have different data handling rules applied to them.

Nathan pointed out that our currently way of dealing with unusual and intermittent events in production is invest heavily in observability (which often just means shipping a lot of low-level logging to a search system). The replay approach seems to promise a much simpler approach for analysing and understand unusual behaviour in environments with access controls.

It was interesting to hear about poking into the internals of the interpreter (and the OS) as it is not often that people get a chance to do it. However the issue of what level of developer access to production is the bigger problem to solve and it would be great to see some evidence of how this works in a real environment.

Standard