RAG | Echo One

The meetup was held at Microsoft’s Reactor offices near Paddington which have a great view down the canal towards Maida Vale. Attendees got an email with a QR code to get in through the gate which all felt very high-tech.

The first talk was not particularly Python related but was an introduction to vector databases. These are having a hot moment due to the way that machine learning categorisation maps easily into flat vectors that can then be stored and compared through vector stores.

Then can then be used to complement LLMs through the Retrieval Augmented Generation (RAG) which combines the LLM’s ability to synthesis and summarise content with more conventional search index information.

It was fine as it went and helped demystify the way that RAG works but probably this langchain tutorial is just as helpful as to the practical application.

The second talk was about langchain but was from a Microsoft employee who was demonstrating how to use Bing as a agent augmentation in the Azure hosted environment. It was practical but the agent clearly spun out of control in the demo and while the output was in the right ballpark I think it illustrated the trickiness of getting these things to work reliably and to generate reliable output when the whole process is essentially random and different each run.

It was a good shop window into the hosted langchain offering but could have done more to explore the agent definition.

The final talk was by Nathan Matthews CTO of Retrace Software. Retrace allows you to capture replay logs from production and then reproduce issues in other environments. Sadly there wasn’t a demo but it is due to be released as open source soon. The talk went through some of the approaches that had been taken to get to the release. Apparently there is a “goldilocks zone” for data capture that avoids excessive log size and performance overhead. This occurs at the library interface level with a proxy capture system for C integration (and presumably all native integration). Not only is lower level capture chatty but capturing events at a higher-level of abstraction makes the replay process more robust and easier to interact with.

The idea is that you can take the replay of an issue or event in production, replay it on a controlled environment with a debugger attached to try and find out the cause of the issue without ever having to go onto a production environment. Data masking for sensitive data is promised which then means that the replay logs can have different data handling rules applied to them.

Nathan pointed out that our currently way of dealing with unusual and intermittent events in production is invest heavily in observability (which often just means shipping a lot of low-level logging to a search system). The replay approach seems to promise a much simpler approach for analysing and understand unusual behaviour in environments with access controls.

It was interesting to hear about poking into the internals of the interpreter (and the OS) as it is not often that people get a chance to do it. However the issue of what level of developer access to production is the bigger problem to solve and it would be great to see some evidence of how this works in a real environment.

Echo One

Sequentially arranged sentences composed of words (and punctuation)

Tag Archives: RAG

London Python meetup May 2024