Programming

London Django Meetup May 2023

Just one talk this time and it was more of a discussion of the cool things you can do with Postgres JSON fields. These are indeed very cool! Everything I wanted to do with NoSQL historically is now present in a relational database without compromise on performance or functionality, that is an amazing achievement by the Postgres team.

The one thing I did learn is that all the coercion and encoding information is held in the Django model and query logic which means you only have basic types in the column. I previously worked on a codebase that used SQLAlchemy and a custom encoder and decoder which split custom types into a string field with the Python type hint (e.g. Decimal, UUID) and the underlying value. By comparison with the Django implementation which appears to just use strings this is a leaky abstraction where the structure of the data is compromised by the type hint.

Using the Django approach would have been easier when using direct SQL on the database and followed the principle of least surprise.

The speaker was trying to make a case for performing aggregate calculations in the database but via the Django ORM query language which wasn’t entirely convincing. Perhaps if you have a small team but the resulting query language code was more complex that the underlying query and was quite linked to the Postgres implementation so it felt that maybe a view would have been a better approach unless you have very dynamic calculations that are only applied for a fixed timespan.

It was based on an experience report so it clearly worked for the implementing group but if felt like the approach strongly coupled the database, the web framework and the query language.

Standard
Python

London Django Meetup April 2023

I’m not sure whether I’ve ever been to this Meetup before but it is definitely the first since 2020. It was hosted by Kraken Energy in their offices which have a plywood style auditorium with a nice AV setup for presentations and pizza and drinks (soft and hard) for attendees.

There were two talks: one on carbon estimates for websites built using Django and Wagtail; the other about import load times when loading a Django app into a shell (or more generally expensive behaviour in Python module imports).

Sustainable or low impact computing is a topic that is slowly gaining some traction in the wider development community and in the case of the web there are some immediate quick wins in the form of content negotiation on image formats, lazy loading and caching to be had.

One key takeaway from the talk is that the end user space is the area where most savings are possible. Using large scale cloud hosting means that you are already benefiting from energy efficiencies so things like the power required for a mobile phone screen matters because the impact of inefficient choices in content delivery is multiplied by the size of your audience.

There was a mention in passing that if a web application could be split into a Functions as a Service (FaaS) deployable then, for things like Django that have admin paths and end user paths, you can scale routes independently and save on overprovisioning. If this could be done automatically in the deployment build it would be seamless from the developer’s point of view. I think you can do this via configuration in the Serverless framework. It seems an interesting avenue for making more efficient deployments but at a cost in complexity for the framework builders.

There was quite an interesting research opportunity mentioned in the talk around serverless-style databases. For sites with intermittent or cyclical usage having an “always on” database represents a potentially big saving on cost and carbon. There was mention of the service neon.tech which seems to have a free personal tier which might be perfect for hobby sites where usage is very infrequent and a spin up time would be acceptable.

The import time talk was interesting, it focused on the developer experience of the Django shell boot time (although to be honest a Python shell for any major framework has the same issues). There were some practical tips on avoiding libraries with way too much going on during the import phase but really the issue of Python code doing expensive eager activity during import has been a live one for a long time.

I attended a talk about cold starts with Python AWS Lambdas in 2019 that essentially boiled down to much of the same issues (something addressed, but not very well in this AWS documentation on imports). Little seems to have improved since and assumptions about whether a process is going to be short or long-lived ultimately boils down to the library implementer and the web/data science split in Python means that code is run in very different contexts making sharing libraries across these two use cases hard.

The core language implementation is getting faster but consensus on good practice in import time behaviour is not a conversation that seems to be happening between the major library maintainers.

The performance enhancements for core Python actually linked the two talks because getting existing code onto more efficient runtimes helps reduce compute demands across all usage.

Standard
Blogging, Python

PyCon UK 2022

This was the first in-person PyCon since the start of the pandemic. It had slightly changed format as well, now being mostly single-track (plus workshops) and not having a teacher/youth day. Overall I found the changes welcome. I’m generally a big fan of single track conferences because they simplify the choices and help concentrate the conversation amongst the attendees.

A lot of talks were by developer advocates but some of the most interesting talks came from the core language maintainers talking about the challenges in balancing the needs of different parts of the community while the least interesting were from the company sponsors who generally didn’t seem to have refined their content for a general audience.Typescript

A simple lightning talk about the need to sanitise the input of the builtin int function was illuminating about the challenges of reconciling web needs versus data science. However the general point about whether unlimited number functions are necessary or not was interesting. Similarly the challenge of when to adopt new syntax and keywords in the case of the addition of Exception Groups to the language.

There were a few talks about how the process of maintaining a language and community actually works. How conferences get listed on the Python website, how performance benchmarks are built and used and the desire to have a single place to centre conversation about the language.

There were talks about how to interface to various technologies with Python (Kafka, Telegram) and the inevitable talk about how to improve the performance of your data science code by using NumPy. There were also quite a few talks about Hypothesis; which probably still isn’t used as much as it should be in the community. The library now includes a test generator that can examine code and suggest a test suite for it which is quite a cool feature and certainly would be a boon for those who dislike test-first approaches (whether it has the same quality benefits is another question).

The other talk that had a big impact on my was the introduction to using PyScript in static websites. Python is going to start targeting WebAssembly as a platform which will help standardise the various projects trying to bring a fully functional interpreter to the browser and provide a place to pool efforts on improvements.

PyScript aims to allow Python to also script DOM elements and be an alternative scripting language in the browser. That’s fun but not massively compelling in my view (having done many compile to Javascript languages in the past I think it’s easier to just do Javascript (with an honourable exception for Typescript). Being able to run existing code and models in the browser without change maximises the value and potential of your existing Python codebases.

Cardiff remains a lovely venue for the conference and I think it is worth taking time out from the tracks to enjoy a bit of the city and the nearby Bute Park.

Links

  • Pyjamas, a 24 hour online conference
  • Pyrogram, a Python library for interacting with Telegram
  • HPy, an updated API for writing C extensions in Python
  • Discuss Python, the general Python community forum
  • Trustme, a testing library that allows local certificates as testing fixtures
  • PyScript, Python in the browser via Web Assembly
  • PyPerformance, benchmarks for testing Python implementations
Standard
Programming

Keeping the batteries included

Python is well-known for being a “batteries included” language which means it comes with a rich variety of modules that work right out of the box. However this recent post about the regular expression library shows the problems that can occur from shipping such a wide variety of code as part of the language core. Maintaining a wide-ranging codebase is challenging as is keeping it aligned to a language’s release cadence.

The problems suggest that really the language should focus just on the core tools and syntax of the language with everything else being on a different release cycle. However by itself that core isn’t too useful for pragmatic coding purposes. Curation of code and having a sensible selection of libraries is a challenge. Some languages like Clojure and Elm have a controlled ecosystem of libraries that are adjacent to a small language core. However here the difference between curation and gatekeeping is fine and it feels like only languages with a large community can do it effectively.

Perhaps the answer is to have a basic implementation of core functionality in the core but to use the language documentation to suggest alternative libraries. This moves the problem to the documentation team but hopefully this is a simpler arena to both maintain and curate.

Standard
Programming

Recommendation: use explicit type names

I attended an excellent talk recently at Lambdale about teaching children to code (not just Scratch but actual computer languages) and one of the points of feedback was that the children found the use of names in type declarations and their corresponding implementations in languages such as Haskell confusing because frequently names are reused despite the two things being completely different categories of things.

The suggestion was that instead of f[A] -> B it was simpler to say f[AType] -> BType.

Since I am in the process of introducing Python type checking at work this immediately sparked some interest in me and when I returned to work I ran a quick survey with the team that revealed that they too preferred the explicit type name with the suffix Type rather than package qualifier for types that I had been using.

Instead of kitchen_types.Kettle the preference was for KettleType.

The corresponding function type annotations would therefore read approximately:

def boil(water: WaterType, kettle: KettleType) -> WaterType

So that’s the format we’re going to be adopting.

Standard
Programming

PyPika

PyPika is a DSL for creating SQL queries. It works by generating SQL from definitions that you create inline rather than by interpreting models or data classes or structures.

It makes it easy to define minimum projections and is also straight-forward to bind data to as you simply provide the values into the query that is generated rather than binding values to a query definition.

Once you have the query object constructed you call str on the object to obtain the actual SQL which you then need to pass to some execution context.

This also means that it is easy to test what SQL you are generating (although really using a DSL like this means you should really be trusting the library).

I have only had one real problem with the statements I have used to date and that is around the Postgres `RETURNING` statement that allows a generated key to be returned to the caller of an INSERT.

Apart from this using Pypika has been better than using a Python template or writing raw SQL.

Standard
Programming, Python

403 Forbidden errors with Flask and Zappa

One thing that tripped me up when creating applications with Zappa was an error I encountered with form posting that seems to have also caught out several other developers.

The tldr is that if you are getting a 403 Forbidden error but your application is working locally then you probably have a URL error due to the stage segment that Zappa adds to the URL of the deployed application. You need to make sure you are using url_for and not trying to write an absolute path.

The stage segment

Zappa’s url structure is surprisingly complicated because it allows you to have different versions of the code deployed under different aliases such as dev, staging and production.

When running locally your code doesn’t have the stage prefix so it is natural to use a bare path, something like flask.redirect(‘/’) for example.

If you’re using the standard form sequence of GET – POST – Redirect then everything works fine locally and remotely until the raw redirect occurs remotely and instead of getting a 404 error (which might tip you off to the real problem more quickly) you get a 403 forbidden because you are outside the deployed URL space.

If you bind a DNS name to a particular stage (e.g. app-dev.myapp.com) then the bare path will work again because the stage is hidden behind the CloudFront origin binding.

Always use url_for

The only safe way to handle URLs is for you to delegate all the path management and prefixing to Zappa. Fortunately Flask’s in-built url_for function, in conjunction with the Zappa wrapper can take care of all the grunt work for you. As long as all your urls (both in the template and the handlers) use url_for then the resulting URLs will work locally, on the API Gateway stages and if you bind a DNS name to the stage.

If this is already your development habit then great, this post is irrelevant to you but as I’ve mostly been using Heroku and App Engine for my hobby projects I’d found myself to be in the habit of writing the URLs as strings, as you do when you write the route bindings.

Then when the error occurred I was checking the URL against my code, seeing that they matched and then getting confused about the error because mentally I’d glossed over the stage.

Standard
Programming

Using Google App Engine with Pyenv

I recently started using PyEnv to control my Python installations and make it easier to try to move more of my code to Python 3.

Google App Engine though is unapologetically Python 2.7. Google wants people to move away from the platform in favour of Google Compute custom environments and therefore has little incentive to upgrade the App Engine SDK and environments to support Python 3.

When I set my default Python to be Python 3 with PyEnv I found that despite setting a local version of Python 2.7 my App Engine instance was failing to run with an execfile is not defined exception.

The App Engine Python scripts use #!/usr/bin/env python to invoke the interpreter and for some reason PyEnv doesn’t seem to override the global setting for this despite it being correct when you check it in your shell.

After a lot of frustration and googling for an answer I haven’t found anything elegant. Instead I found this Stack Overflow answer which helpful explained that you can use #!/usr/bin/env/python2 to invoke a specific language interpreter.

Manually changing the shebang line in appcfg.py and dev_appserver.py solved the problem for me and got me running locally.

Obviously this is a pain if I upgrade and I feel it might be better for Google to change the scripts since they don’t have a plan to move to Python3.

Standard
Programming

Python: Preferring Named Tuples over Classes

One of the views that I decided to take in my recent Python teaching is that named tuples and functions are preferable to class-based data structures.

Python's object-orientated (OO) code is slightly strange anyway since it is retrospectively applied to the original language and most programmers find things like the self reference confusing compared to OO idioms in languages like Ruby or Java.

On top of this Python's dynamic nature means that objects are actually "open" (i.e. can take new attributes at runtime) and have few strong encapsulation guarantees. Most of which is going to be surprising to most OO-programmers who would expect the type to be binding.

Named-tuples on the other hand are immutable so their values cannot be changed and they cannot be expanded or reduced by adding or removing attributes. Their behaviour is much more defined while retaining syntax-sugar access to the attributes themselves.

Functions that operate on tuples and return tuples have some nice properties in terms of working with code. Firstly you know that there are no sequencing issues. A function that takes a tuple as an argument cannot change it so any other function is free to consume it again as an argument.

In addition you know that you are free to consume the tuple value generated by a function. As the value cannot be changed it is safe to pass it around the codebase.

I think the question should be: where are classes appropriate in ways that tuples are not?

The most common valid use of classes and inheritance is to provide a structure in a library where you expect other programmers to supply appropriate behaviour. Using classes you can simply allow the relevant methods to be implemented in the inheriting implementation. A number of Python web frameworks use this Template pattern to allow the behaviour of handlers to be defined.

Even then this is not the definitive solution. Frameworks such as Flask, use decorators instead which fits with the functional approach.

So in general I think it is simpler and easier to maintain programs that consists of functions taking and generating immutable data structures like tuples. Using Python's object-orientation features should be considered advanced techniques and used only when necessary.

Standard
Gadgets, Programming, Python

Creating the Guardian’s Glassware

For the last two months on and off I’ve been developing the Guardian’s Glassware in conjunction with my colleague Lindsey Dew.

Dealing with secret-pre-alpha hardware has at times being interesting but the actual process of writing services using the Mirror API is actually pretty straight-forward.

Glass applications are divided into native, using the Glass SDK, and service-based Glassware using Mirror. Mirror is built on web-friendly technologies such as HTTP, JSON and OAuth2. It also follows the Google patterns for APIs so things like authentication, discovery and the client libraries are all as you would expect if you’ve used a modern Google API before.

For our project, which was focussed on trying to create a sensible, useful newsfeed to Glass, we went with Mirror. If you want to do things like geolocation or picture and video upload then you’ll want to go native.

For various reasons we had a very narrow initial window for development. Essentially we had to start and finish in May. Our prototyping was done with a sample app from Google (you can use Mirror without an actual device), the Mirror playground and a lot of imagination.

When we actually got our Glass devices it took about a week to get my head round what the usecase was. I was thinking that it was like a very lightweight mobile phone but it is much more pervasive with lots of light contact points. I realised that we could be more aggressive about pushing information out and could go for larger sets of stories (although that was dialled back a bit in the final app to emphasise editorial curated content).

Given the tight, fixed deadline for an unknown product the rest of the application was build using lots of known elements. We used a lot of the standard Glass card templates. We used Python on Google App Engine to simplify the integration service and because we had been building a number of apps on that same stack. The application has a few concerns:

  • performing Google Authentication flow
  • polling the Guardian’s Content API and our internal Notification platform
  • writing content to Mirror
  • handling webhook callbacks from Mirror
  • storing a user’s saved stories

We use Content API all the time and normally we are rendering it into widgets or pages but here we are just transforming JSON into JSON.

The cards are actually rendered by our application and the rendered content is packaged into the JSON along with a text representation. However rendering according to the public Glass stylesheet and the actual device differed, and therefore checking the actual output was important.

The webhooks are probably best handled using deferred tasks so that you are handing off the processing quickly and limiting the concern to just processing the webhook’s payload.

For the most part the application is a mix of Google stock API code and some cron tasks that reads a web API and writes to one.

Keeping the core simple meant it was possible to iterate on things like the content mix and user interactions. The need to verify everything in device served as a limiting factor.

Glass is a super divisive technology, people are very agitated when they see you wearing it. No-one seems to have an indifferent opinion about them.

Google have done a number of really interesting things with Glass that are worth considering from a technology point of view even if you feel concerned about privacy and privilege.

Firstly the miniaturisation is amazing. The Glass hardware is about the size of a highlighter and packs a camera, memory, voice synth, wifi and bluetooth. The screen is amazingly vivid and records and plays video well. It has a web browser that seems really capable of standard HTML rendering.

The vocal recognition and command menus are really interesting and you feel a little bit space age when you fire off a Google query and get the information you’re looking for read back to you in seconds.

Developing with the Mirror API is really interesting because it solves the Android fragmentation issue. My application talks to Mirror, not to the native device. If Google want to change the firmware, wire protocol or security they can without worrying about how it will affect the apps. If they do have to make breaking changes then can use the standard webapi versioning they already use.

Unlike most of the Guardian projects this one has been embargoed before the UK launch but it is great to see it out in the open. Glass might not be the ultimate wearable tech answer; just as the brick phones didn’t directly point to the iPhone. Glass is a bold device and making the Guardian’s journalism available on a new platform has been an interesting test of our development processes and an interesting challenge to the idea of what web-capable devices are (just as the Pixel exposed some flaky thinking about what a touch device is).

What will be interesting from here is how journalists will use Glass. Our project didn’t touch on how you can use Glass to share content from the scene, but the Glass has powerful capabilities to capture pictures and video hands-free and deliver it back to desk editors. There’s already a few trials planned in less stressful feature pieces and it will be interesting to see if people find the interface intuitive and more convenient that firing up their phone.

Standard