Software

Thoughts on the ethical use of LLMs

Large language models (LLMs) have felt fraught with issues. Let’s start with the environmental impact which has been completely disastrous and has essentially led to Big Tech ditching all their Net Zero promises. Vast amounts of speculative money have led to truly insane amounts of energy being spent creating models that don’t have strongly differentiated capabilities. Between this and cryptocurrency you’d be surprised to discover that electricity is not free and actually has consequences to its use.

Then there is the question of the corpus, mass ingestion of content for training AIs combined with obfuscation on the original of that material has resulted in a toxic situation for people who feel they have been taken advantage of and dubious legal situation for people using the output of such models.

The inherent flaws of the models’ probabilistic nature (hallucination, non-determinism) combined with user’s flawed mental models of what is happening is causing all kinds of strange fallout.

Finally there are the way that LLMs are being applied to problems, namely without any discretion or thought as to whether they have any relevance to the situation in hand. Again that glut of money at a time when most businesses are being squeezed by interest rates means that what gets used is what funders are excited about not what users need.

Now I’m not anti-AI or LLM in principle. I think there are some strong use-cases: summarisation, broad textual or structured document analysis and light personalisation. All machine models have infinite patience for user interaction and it seems humans prefer the tone of model generated content to that created by humans (which creates the burning question, why?) (2025-01-16: this article on how cognitive biases feed into interpretations of chat bot interactions seems relevant but it also includes an important reminder that the models are human ranked and tuned before they are released so I think it is natural that high agreeability would score well and unfriendly models would be binned). I think LLMs with access to vast amounts of information help put a floor under people’s understanding of problems and how to tackle things which is why individual subscriptions have been more popular than institutional ones.

However the foundation under these valid use cases needs to be sound and it currently it isn’t.

The new models by Pleais show that it is possible to do a better job of these problems. By having clearer information about the provenance of the information and the terms under which the team were allowed to use it. They have also been open about the carbon cost of training the model.

There still remain questions about the carbon cost of running the model and some about what the researchers mean about generating additional material for their corpus but this feels like the minimum that the bigger players should be offering.

The clarity over the training set should help alleviate the concerns people have about content being exploited with componsation or permission. Clear carbon figures mean we can start to compare the cost of creating new models and start to measure the efficiency of such efforts. Such a consideration should maybe be a factor in deciding whether a training process should be continued or not.

Privacy concerns can be alleviated by running models locally as well as insisting on greater clarity in the terms of service of the cloud providers (something I think Amazon moved closer towards with their Nova models).

I believe it is possible to address the genuine concerns people have about LLMs and to benefit from their use but the problems need to be acknowledge and addressed in a way that the mad scramble for AI gold simply has not done so far.

Standard
Month notes

June 2024 month notes

Meetups

I went to the Django monthly meeting (which clashed with the first England football match and the Scala meetup) where my former colleague Leo Giordani talked about writing his own markup language Mau for which he’d even hand-rolled his own parser so that he could switch lexing modes between block and inline elements.

Browsers

The Ladybird browser got a non-profit organisation to support its development and the discussion about it reminded me that the servo project also exists.

In the past we’ve understood that it is important to have a choice of different implementations to select from for browsers, so I think it is good to have this community based browsers to compliment the commercial and foundation backed browsers.

I also used Lynx for the first time in many years as I wanted to test a redirect issue with a site and it is still probably the easiest way to check if public facing sites are routing as they should.

Alternative search engines

I started giving Perplexity a go this month after seeing it recommended by Seth Godin. That was before the row with content creators kicked off in earnest. I’m going to let that settle out before continuing to explore it.

I was using it not for straight queries but instead to ask for alternatives to various products or methods. It successfully understood what I was talking about and did I successfully offer alternatives along with some pros and cons (which to be honest felt quite close to to the original material rather than being a synthesis). Queries that benefit from synthesis is definitely one area where LLM-based queries are better than conventional searching by topics.

I’ve also tried this on Gemini but the answers didn’t feel as good as the referenced sources were not as helpful. I would have thought the Google offering would have been better at this but having said that a lot of the Google first page search widgets and answer summary are often not great either.

CSS Units

I learnt about the ex CSS unit this month as well as some interesting facts about how em is actually calculated. I might take up the article’s suggestion of using it for line-height in future.

The calculation of em seems to be the root cause for the problems leading to this recommendation to use rem for line width rather than ch (I’ve started use using ch after reading Every Layout but I don’t use a strict width for my own projects judging myself what feels appropriate).

The environmental impact of LLMs

Both Google and Microsoft (Register’s article, Guardian article) announced that they have massively increased their emissions as a result of increased usage and training of AI models.

The competition to demonstrate that a company has a leading model is intense and there is a lot of money being driven through venture capital and share prices that provides the incentive. This profligacy of energy doesn’t feel like a great use of resources though.

I’ve also read that Google has relied on being offsets rather than switching to genuinely sustainable fossil-fuel-free energy. Which if true is completely mad.

Reading list

I learnt this month that Javascript has an Atomics package which is quite intriguing as I think Atomics are some of the easiest concurrency elements to work with. The Javascript version is quite specific and limited (it works only with ArrayBuffers) but it had completely passed me by.

I also really enjoyed reading through bits of this series on writing minimal Django projects which really helps explain how the framework works and how the bits hang together.

Standard
Python

London Python meetup May 2024

The meetup was held at Microsoft’s Reactor offices near Paddington which have a great view down the canal towards Maida Vale. Attendees got an email with a QR code to get in through the gate which all felt very high-tech.

The first talk was not particularly Python related but was an introduction to vector databases. These are having a hot moment due to the way that machine learning categorisation maps easily into flat vectors that can then be stored and compared through vector stores.

Then can then be used to complement LLMs through the Retrieval Augmented Generation (RAG) which combines the LLM’s ability to synthesis and summarise content with more conventional search index information.

It was fine as it went and helped demystify the way that RAG works but probably this langchain tutorial is just as helpful as to the practical application.

The second talk was about langchain but was from a Microsoft employee who was demonstrating how to use Bing as a agent augmentation in the Azure hosted environment. It was practical but the agent clearly spun out of control in the demo and while the output was in the right ballpark I think it illustrated the trickiness of getting these things to work reliably and to generate reliable output when the whole process is essentially random and different each run.

It was a good shop window into the hosted langchain offering but could have done more to explore the agent definition.

The final talk was by Nathan Matthews CTO of Retrace Software. Retrace allows you to capture replay logs from production and then reproduce issues in other environments. Sadly there wasn’t a demo but it is due to be released as open source soon. The talk went through some of the approaches that had been taken to get to the release. Apparently there is a “goldilocks zone” for data capture that avoids excessive log size and performance overhead. This occurs at the library interface level with a proxy capture system for C integration (and presumably all native integration). Not only is lower level capture chatty but capturing events at a higher-level of abstraction makes the replay process more robust and easier to interact with.

The idea is that you can take the replay of an issue or event in production, replay it on a controlled environment with a debugger attached to try and find out the cause of the issue without ever having to go onto a production environment. Data masking for sensitive data is promised which then means that the replay logs can have different data handling rules applied to them.

Nathan pointed out that our currently way of dealing with unusual and intermittent events in production is invest heavily in observability (which often just means shipping a lot of low-level logging to a search system). The replay approach seems to promise a much simpler approach for analysing and understand unusual behaviour in environments with access controls.

It was interesting to hear about poking into the internals of the interpreter (and the OS) as it is not often that people get a chance to do it. However the issue of what level of developer access to production is the bigger problem to solve and it would be great to see some evidence of how this works in a real environment.

Standard