Month notes, Work

March 2024 month notes

Dependabot under the hood

I spent a lot more time this month than I was expecting with one of my favourite tools Github’s Dependabot. It started when I noticed that some of the projects were not getting security updates that others were. I know it possible for updates to be suspended on projects that neglect their updates for too long (I should really archive some of my old projects) but checking the project settings confirmed that everything was setup correctly and there was nothing that needed enabling.

Digging in I wondered how you are meant to view what Dependabot is doing, you might think it is implemented as an Action or something similar but in fact you access the information through the Insights tab.

Once I found it though I discovered that the jobs had indeed been failing silently (I’m still not sure if there’s a way to get alerted about this) because we had upgraded our Node version to 20 but had set the option engine-strict on. It turns out that Dependabot runs on its own images and those were running Node 18. It may seem tempting to insist that your CI uses the same version as your production app but in the case of CI actions there’s no need to be that strict, after all they are just performing actions in your repository management that aren’t going to hit your build chain directly.

Some old dependencies also caused problems in trying to reconcile their target version, the package.json Node engine and the runtime Node version. Fortunately these just highlighted some dependency cruft and depreciated projects that we just needed to cut out of the project.

It took a surprising amount of time to work through the emergent issues but it was gratifying to see the dependency bundles flowing again.

Rust

I started doing the Rustlings tutorial again after maybe a year in which I’d forgotten about it (having spent more time with Typescript recently). This is a brilliant structured tutorial of bite-sized introductions to various Rust concepts. Rust isn’t that complicated as a language (apart from its memory management) but I’ve found the need to have everything right for the code to compile means that you tend to need to devote dedicated time to learning it and it is easy to hit some hard walls that can be discouraging.

Rustlings allows you to focus on just one concept and scaffolds all the rest of the code for you so you’re not battling a general lack of understanding of the language structure and just focus on one thing like data structures or library code.

Replacing JSX

Whatever the merits of JSX it introduces a lot of complexity and magic into your frontend tooling and I’ve seen a lot of recommendations that it simply isn’t necessary with the availability of tagged string literals. I came back to an old Preact project this month that I had built with Parcel. The installation had a load of associated security alerts so on whim I tried it with ViteJS which mostly worked except for the JSX compilation.

Sensing a yak to shave I started to look at adding in the required JSX plugin but then decided to see if I really needed it. The Preact website mentioned htm as an alternative that had no dependencies. It took me a few hours to understand and convert my code and I can’t help but feel that eliminating a dependency like this is probably just generally a good idea.

The weirdest thing about htm is how faithful it is to the JSX structure, I was expecting something a bit more, well, HTML-ly but props and components pretty much work exactly how they do in JSX.

Postgres news

A Postgres contributer found a backdoor into SSH that required an extensive amount of social engineering to achieve. If you read his analysis of how he discovered it then it seems improbable that it would have been discovered. Some people have said this is a counterpoint to “many eyes make bugs shallow” but the really problem seems to be how we should be maintaining mature opensource projects that are essentially “done” and just need care and oversight rather than investment. Without wanting to centralise open source it feels like foundations actually do a good job here by allowing these kind of projects to be brought together and have consistent oversight and change management applied to them.

I read the announcement of pgroll which claims to distil best practice for Postgres migrations regarding locks, interim compatibility and continuous deployment. That all sounds great but the custom definition format made me feel that I wanted to understand it a little better and as above, who is going to maintain this if it is a single company’s tool?

Postgres was also compiled into WASM and made available as an in-memory database in the browser, which feels a bit crazy but is also awesome for things like testing. It is also a reminder of how Web Assembly opens up the horizons of what browsers can do.

Hamstack

Another year, another stack. I felt Hamstack was tongue in check but the rediscovery of hypermedia does feel real. There’s always going to be a wedge of React developers, just like there will be Spring developers, Angular developers or anything else that had a hot moment at some point in tech history. However it feels like there is more space to explore web native solutions now than there was in the late 2010s.

This article also introduced me to the delightful term “modulith” which perfects describes the pattern that I think most software teams should follow until the hit the problems that lead to other solution designs.

Standard
Work

January 2024 month notes

Water CSS

I started giving this minimal element template a go after years of using various versions of Bootstrap. It is substantially lighter in terms of the components it offers with probably the navigation bar being the one component that I definitely miss. The basic forms and typography are proving fine for prototyping basic applications though.

Node test runner

Node now has a default test runner and testing framework. I’ve been eager to give it a go as I’ve heard that it is both fast and lightweight, avoiding the need to select and include libraries for testing, mocking and assertions. I got the chance to introduce it in a project that didn’t have any tests and I thought it was pretty good although it’s default text output felt a little unusual and the alternative dot notation might be a bit more familiar.

It’s interesting to see that the basic unit of testing is the assertion, something is shares with Go. It also doesn’t support parameterised tests which again is like Go which has a pattern of table-driven tests implemented with for loops except that Go allows more control of the dynamic test case naming.

I’d previously moved to the Ava library and I’m not sure there is a good reason not to use the built-in alternative.

Flask blueprints

In my personal projects I’ve tended to use quite a few cut and paste modules and over the years they tend to drift and get out of sync so I’ve been making a conscious effort to learn about and start adopting Flask Blueprints. Ultimately I want to try and turn these into personal module dependencies that I can update once and use in all the projects. For the moment though it is interesting how the blueprints format is pushing me to do some things like logging better (to understand what is happening in the blueprint) and also structuring the different areas of the application so that they are quite close to Django apps with various pieces of functionality now starting to be associated with a url prefix that makes it a bit easier to create middleware that is registered as part of the Blueprint rather than relying on imports and decorators.

Web components

I’ve been making a bit of progress with learning about web components. I realised that I was trying to do too much initially which is why they were proving complicated. Breaking things down a bit has helped with an initial focus on event listeners within the component. I’m also not bringing in external libraries at the moment but have got as far as breaking things up into [ESM modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules) which has mostly worked out so far.

Standard
Programming

Enterprise programming 2023 edition

Back in the Naughties there were Enterprise Java Beans, Java Server Pages, Enterprise Edition clustered servers, Oracle databases and shortly thereafter the Spring framework with its dependency injection wiring. It was all complicated, expensive and to be honest not much fun to work with. One of the appeals of Ruby on Rails was that you could just get on an start writing a web application rather than staring at initialisation messages.

During this period I feel there was a big gap between the code that you wrote for living and the code you wrote for fun. Even if you were writing on the JVM you might be fooling around with Jython or Groovy rather than a full Enterprise Java Bean. After this period, and in particular post-Spring in everything, I feel the gap between hobby and work languages collapsed. Python, Ruby, Scala, Clojure, all of these languages were fun and were equally applicable to work and small-scale home projects. Then with Node gaining traction in the server space then gap between the two worlds collapsed pretty dramatically. There was a spectrum that started with an inline script in a HTML page that ran through to a server-side API framework with pretty good performance characteristics.

Recently though I feel the pendulum has been swinging back towards a more enterprisey setup that doesn’t have a lot of appeal for small project work. It often feels that a software delivery team can’t even begin to create a web application without deploying on a Kubernates cluster with Go microservices being orchestrated with a self-signing certificate system and a log shipping system with Prometheus and Grafana on top.

On the frontend we need an automatically finger-printing statically deployed React single-page app, ideally with some kind of complex state management system like sagas or maybe everything written using time-travelable reactive streams.

Of course on top of that we’ll need a design system with every component described in Storybook and using a modular class-based CSS system like Tailwind or otherwise a heavyweight styled component library based on Material design. Bonus points for adding React Native into this and a CI/CD system that is ideally mixes a task server with a small but passionate community with a home-grown pipeline system. We should also probably use a generic build tool like Bazel.

And naturally our laptop of choice will be Apple’s OSX with a dependency on XCode and Homebrew. We may use Github but we’ll probably use a monorepo along with a tool to make it workable like Lerna.

All of this isn’t much fun to work on unless you’re being paid for it and it is a lot of effort that only really pays off if you hit the growth jackpot. Most of the time this massive investment in complex development procedures and tooling simply throws grit into the gears of producing software.

I hope that soon the wheel turns again and a new generation of simplicity is discovered and adopted and that working on software commercially can be fun again.

Standard
Programming, Python

Transcribing podcasts with Google’s Speech to Text API

I don’t really listen to podcasts, even now when I have quite a long commute. I generally read faster than I can listen and prefer to read through transcripts than listen, even when the playback speed is increased. Some shows have transcripts and generally I skim read those when available to see if it would be worth listening to segments of the podcasts. But what about the podcasts without transcripts? Well Google has a handy Speech to Text API so why not turn the audio into a text file and then turn it into a HTML format I can read on the phone on the tube?

tldr; the API is pretty much the same one as generates the Youtube automatic subtitling and transcripts. It can just about create something that is understandable as a human but its translation of vernacular voices is awful. If Youtube transcripts don’t work for you then this isn’t a route worth pursuing.

Streaming pods

I’m not very familiar with Google Cloud Services, I used to do a lot of App Engine development but that way of working was phased out in favour of something a bit more enterprise friendly. I have the feeling that Google Cloud’s biggest consumers are data science and analysis teams and the control systems intersect with Google Workspace which probably makes administration easier in organisations but less so for individual developers.

So I set up a new project, enabled billing, associated the billing account with a service account, associated the service account with the project and wished I’d read the documentation to know what I should have been doing. And after all that I created a bucket to hold my target files in.

You can use the API to transcribe local audio files but only if they are less than 60 seconds long. I needed to be using the long running asynchronous invocation version of the API. I also should have realised that I need to write the transcription to a bucket too, I ended up using the input file name with “.json” attached but until I started doing that I didn’t realise that my transcription was failing to recognise my input.

Learning the API

One really nice feature Google Cloud has is the ability to run guided tutorials in your account via CloudShell. You get a step by step guide that can simply paste the relevant commands to your shell. Authorising the shell to access the various services was also easier than generating credentials locally for what I wanted to do.

Within 10 minutes I had processed my first piece of audio and had a basic Python file setup. However the test file was in quite an unusual format and the example was the synchronous version of the API.

I downloaded a copy of the Gettysburg address and switched the API version but then had my CloudShell script await the outcome of the transcoding.

Can you transcribe MP3?

The documentation said yes (given a specific version) and while the client code accepted the encoding type, I never got MP3 to work and instead I ended up using ffmpeg to create FLAC copies of my MP3 files. I might have been doing something wrong but I’m not clear what it was and the job was accepted but it was returning an empty JSON object (this is where creating files for the output is much more useful that trying to print an empty response).

FLAC worked fine and the transcript seemed pretty on the money and converting the files didn’t seem that much of a big deal. I could maybe do an automatic conversion later when the file hit the bucket if I needed to.

However after my initial small files I found that waiting for the result of the API call resulted in hitting a timeout on the execution duration within the shell. I’ve hit something like this before when running scripts over Google Drive that copied directories. I didn’t have a smart solution then (I just skipped files that already existed and re-run the jobs a lot) and I didn’t have one now.

Despite the interactive session timing out the job completed fine and the file appeared in the storage bucket. Presumably this would have been where it would have been easier to be running the script locally or on some kind of temporary VM. Or perhaps I should have been able to get the run identifier and just have checked the job using that. The whole asynchronous execution of jobs in Google Cloud is another area where what you are meant to do is unclear to me and working on this problem didn’t require me to resolve my confusion.

Real audio is bobbins

So armed with a script that had successfully rendered the Gettysburg address I switched the language code to British English, converted my first podcast file to FLAC and set the conversion running.

The output is pretty hilarious and while you can follow what was probably being said it feels like reading a phonetic version of Elizabethan English. I hadn’t listened to this particular episode (because I really don’t listen to podcasts, even when I’m experimenting on them) but I did know that the presenters are excessively Northern and therefore when I read the text “we talk Bob” I realised that it probably meant “we are talking bobbins”. Other gems: “threw” had been rendered as “flu” and “loathsome” as “lord some”. Phonentically if you know the accent you can get the sense of what was being talked about and the more mundane the speech the better the transcription was. However it was in no way an easy read.

I realised that I was probably overly ambitious going from a US thespian performing a classic of political speechwriting to colloquial Northern and London voices. So next I chose a US episode, more or less the first thing I could get an MP3 download of (loads of the shows are actually shared on services that don’t allow you access to the raw material).

This was even worse because I lacked the cultural context but even if I had, I have no idea how to interpret “what I’m doing ceiling is yucky okay so are energy low-energy hi”.

The US transcript was even worse than the British one, partly I think because the show I had chosen seems to have the presenters talking over one another or speaking back and forth very rapidly. One of them also seems to repeat himself when losing his chain of thought or wanting to emphasise something.

My next thought was to try and find a NPR style podcast with a single professional presenter but at this point I was losing interest. The technology was driving what content I was considering rather than bringing the content I wanted to engage with to a different medium.

You Tube audio

If you’ve ever switched on automatic captioning in Youtube then you’ve actually seen this API in action, the text and timestamps in the JSON output are pretty much the same as what you see in both the text transcript and the in-video captioning. My experience is that the captioning is handy in conjunction with the audio but if I was fully deaf I’m not sure I would understand much about what was going on in the video from the auto-generated captions.

Similarly here, the more you understand the podcast you want to transcribe the more legible the transcription is. For producing a readable text that would reasonably represent the content of the podcasts at a skim reading level the technology doesn’t work yet. The unnatural construction of the text means you have to quite actively read it and put together the meaning yourself.

I had a follow-up idea of using speech to text and then automated translation to be able to read podcasts in other languages but that is obviously a non-starter as the native language context is vital for understanding the transcript.

Overall then a noble failure; given certain kinds of content you can actually create pretty good text transcriptions but as a way of keeping tabs on informal, casual audio material, particularly with multiple participants this doesn’t work.

Costs

I managed to blow through a whole £7 for this experiment which actually seemed like a lot for two podcasts of less than an hour and a seven minute piece of audio. In absolute terms though it is less than proverbial avocado on toast.

Future exploration

Meeting transcription technology is meant to be pretty effective including identifying multiple participants. I haven’t personally used any and most of the services I looked at seemed aimed at business and enterprise use and didn’t seem very pay as you go. These however might be a more viable path as there is clearly a level of specialisation that is needed on top of the off-the-shelf solutions to get workable text.

Links

Standard
Programming, Work

August 2023 month notes

I have been doing a GraphQL course that is driven by email. I can definitely see the joy of having autocompletion on the types and fields of the API interface. GraphQL seems to have been deployed way beyond its initial use case and it will be interesting to see if its a golden hammer or genuinely works better than REST-based services outside the abstraction to frontend service. It is definitely a complete pain in the ass compared to HTTP/JSON for hobby projects as having to ship a query executor and client is just way too much effort compared to REST and more again against maybe not doing a Javascript app interface.

I quite enjoyed the course, and would recommend it, but it mostly covered creating queries so I’ll probably need to implement my own service to understand how to bind data to the query language. I will also admit that while it is meant to be quite easy to do each day I ended up falling behind and then going through half of it on the weekend.

Hashicorp’s decision to change the license on Terraform has caused a lot of anguish on my social feeds. The OpenTerraform group has already announced that they will be creating a fork and are also promising to have more maintainers than Hashicorp. To some extent the whole controversy seems like a parade of bastards and it is hard to choose anyone as being in the right but it makes most sense to use the most open execution of the platform (see also Docker and Podman).

In the past I’ve used CloudFormation and Terraform, if I was just using AWS I would probably be feeling smug with the security of my vendor lock-in but Terraform’s extensibility via its provider mechanisms meant you could control a lot of services via the same configuration language. My current work uses it inconsistently which is probably the worst of all worlds but for the most part it is the standard for configuring services and does have some automation around it’s application. Probably the biggest advantage of Terraform was to people switching clouds (like myself) as you don’t have to learn a completely new configuration process, just the differences with the provider and the format of the stanzas.

The discussion of the change made we wonder if I should look at Pulumi again as one of the least attractive things about Terraform is its bizarre status as not quite a programming language, not quite Go and not quite a declarative configuration. I also found out about Digger which is attempting to avoid having two CI infrastructures for infrastructure changes. I’ve only ever seen Atlantis used for this so I’m curious to find out more (although it is such an enterprise level thing I’m not sure I’ll do much than have an opinion for a while).

I also spent some time this month moving my hobby projects from Dataset to using basic Pyscopg. I’ve generally loved using Dataset as it hides away the details of persistence in favour of passing dictionaries around. However it is a layer over SQLAlchemy which is itself going through some major point revisions so the library in its current form is stuck with older versions of both the data interaction layer and the driver itself. I had noticed that for one of my projects queries were running quite slowly and comparing the query time direct into the database compared to that arriving through the interface it was notable that some queries were taking seconds rather than microseconds.

The new version of Psycopg comes with a reasonably elegant set of query primitives that work via context managers and also allows results to be returned in a dictionary format that is very easy to combine with NamedTuples which makes it quite easy to keep my repository code consistent with the existing application code while completely revamping the persistence layer. Currently I have replaced a lot of the inserts and selects but the partial updates are proving a bit trickier as dataset is a bit magical in the way it builds up the update code. I think my best option would be to try and create an SQL builder library or adapt something like PyPika which I’ve used in another of my projects.

One of the things that has surprised me in this effort is how much the official Python documentation does not appear in Google search results. Tutorial style content farms have started to dominate the first page of search results and you have to add a search term like “documentation” to surface it now. People have been complaining about Google’s losing battle with content farms but this is the first personal evidence I have of it. Although I always add “MDN” to my Javascript and CSS searches so maybe this is just the way of the world now, you have to know what the good sites are to find them…

Standard
Programming

Version management with asdf

I typically use languages that are unmanageable without being able to version the language release you are dealing with (Python and Javascript). I have also been historically bad at keeping up to date with releases and therefore ending up with code that sometimes doesn’t run at all (Rust and Scala).

asdf is a version manager to rule them all. It provides a common set of commands to manage language dependencies (and the installation of different language versions) but has a plugin interface that different languages can use to bring in language specific concerns.

As a user you just need to learn one set of commands to manage all languages; implementations can build on a stable core system and simply focus on their requirements. Everyone is a winner.

One top of that instead of having multiple hidden files for multi-language projects (usually Javascript and some other language) you now have one file with all the language definitions in.

The only complication I’ve found is retraining myself to the new command set and remembering which commands work on asdf itself (things like updating the tool itself, setting specific versions in different scopes and managing the language plugins themselves) and which work on the plugins (installing new versions). The plugins also have no requirement to be consistent amongst themselves so in some you can specify “lts” as a target for example or “latest”. Others require the full three digit semantic version. These conventions seem to have come from the tools the plugins are replacing.

Overall though I think retraining myself to learn a single tool is probably going to be easier than having an increasing number of per language systems.

Standard
Programming

PR Reviews: not the messiah, not a naughty boy

On Tech Twitter there seems to be a lot of chat recently about how the Github Pull Request (PR) review process is “broken” or underserves internal teams (an example) that have strong collaboration practices such as pairing or mob coding.

Another objection raised is the idea of internal gatekeeping within development groups, I’m not sure I fully follow the argument but I think it runs along the lines that the PR review process allows powerful, influential members of the group to enforce their views over the others.

This is definitely a problem but frankly writing AST tools linked to the “merge to master” checks is probably a more controlling tool than spending your time policing PRs.

With open source projects often all the action happens at the pull request because the contributors often have no other interaction. Proponents are right to point out that if this works for open source projects do you really need to put effort into other practices upstream of the PR? Opponents are also right to point out that adopting the work practice of an entirely different context into your salaried, work context is crazy. They are both right, individual organisations need to make deliberate decisions using the challenges of both side to the way they are working.

I’m not sure how much of this is a fightback by pairing advocates (that’s how I interpret this thread). There is a general feeling that pairing as a practice has declined.

In my work the practice is optional. As a manager I’ve always known that in terms of output delivery (which before people object, may be important particularly if you’re dealing with runway to meet salary) pairing is best when you’re increasing the average rather than facilitating the experienced.

I think even with pairing you’d want to do a code review step. Pairs are maybe better at avoid getting into weird approaches to solutions than an individual but they aren’t magical and if you are worried about dominant views and personalities pairing definitely doesn’t help solve that.

So I’d like to stick up for a few virtues of the Pull Request Review without making the argument that Pull Requests are all you need in your delivery process.

As an administrator of policy who often gets asked about Software Development LifeCycles (SDLC) as a required part of good software governance. It is handy to have a well documented, automated review process before code goes into production. It ticks a box at minimum disruption and there isn’t an alternative world where you’re just streaming changes into production on the basis of automated testing and production observability anyway.

As a maintainer of a codebase I rely on PRs a lot as documentation. Typically in support you handle issues on much more software than you have personally created. Pairing or mob programming isn’t going to work in terms of allowing me to support a wide ranging codebase. Instead it’s going to create demand for Level 3 support.

Well-structured PRs often allow me to understand how errors or behaviour in production relate to changes in code and record the intent of the people making the change. It makes it easier to see situations that were unanticipated or people’s conception of the software in use and how that varies from actual use.

PR review is also a chance for people outside the core developers to see what is happening and learn and contribute outside of their day to day work. Many eyes is not perfect but it is a real thing and people who praise teaching or mentoring as a way to improve technique and knowledge should be able to see that answering review questions (in a suitable form of genuine inquiry) is part of the same idea.

PRs also form a handy trigger for automated review and processes. Simple things like spelling checks allow a wider range of people to contribute to codebases fearlessly. Sure you don’t need a PR to use these tools but in my experience they seem more effective with the use of a stopping point for consideration and review of the work done to date.

Like a lot of things in the fashion-orientated, pendulum swinging world of software development good things are abandoned when no longer novel and are exaggerated to the point that they become harmful. It’s not a world known for thoughtful reflection and consideration. But saying that pull request reviews undermine trust and cohesion in teams or a formulaic practice without underlying benefit seems unhelpfully controversial and doctrinal.

Standard
Programming

PyPika

PyPika is a DSL for creating SQL queries. It works by generating SQL from definitions that you create inline rather than by interpreting models or data classes or structures.

It makes it easy to define minimum projections and is also straight-forward to bind data to as you simply provide the values into the query that is generated rather than binding values to a query definition.

Once you have the query object constructed you call str on the object to obtain the actual SQL which you then need to pass to some execution context.

This also means that it is easy to test what SQL you are generating (although really using a DSL like this means you should really be trusting the library).

I have only had one real problem with the statements I have used to date and that is around the Postgres `RETURNING` statement that allows a generated key to be returned to the caller of an INSERT.

Apart from this using Pypika has been better than using a Python template or writing raw SQL.

Standard
Programming, Software, Web Applications, Work

Prettier in anger

I’ve generally found linting to be a pretty horrible experience and Javascript/ES haven’t been any exception to the rule. One thing that I do agree with the Prettier project is that historically linters have tried to perform two tasks to mixed success: formatting code to conventions and performing static analysis.

Really only the latter is useful and the former is mostly wasted cycles except for dealing with language beginners and eccentrics.

Recently at work we adopted Prettier to avoid having to deal with things like line-lengths and space-based tab sizes. Running Prettier over the codebase left us with terrible-looking cramped two-space tabbed code but at least it was consistent.

However having started to live with Prettier I’ve been getting less satisfied with the way it works and Prettier ignore statements have been creeping into my code.

The biggest problem I have is that Prettier has managed its own specific type of scope creep out of the formatting space. It rewrites way too much code based on line-size limits and weird things like precedent rules in boolean statements. So for example if you have a list with only one entry and you want to place the single entry on a separate line to make it clear where you intend developers to extend the list Prettier will put the whole thing on a single line if it fits.

If you bracket a logical expression to help humans parse the meaning of the statements but the precedent rules mean that brackets are superfluous then Prettier removes them.

High-level code is primarily written for humans, I understand that the code is then transformed to make it run efficiently and all kinds of layers of indirection are stripped out at that point. Prettier isn’t a compiler though, it’s a formatter with ideas beyond its station.

Prettier has also benefited from the Facebook/React hype cycle so we, like others I suspect, are using it before it’s really ready. It hides behind the brand of being “opinionated” to avoid giving control over some of its behaviour to the user.

This makes using Prettier a kind of take it or leave it proposition. I’m personally in a leave it place but I don’t feel strongly enough to make an argument to remove from the work codebase. For me currently tell Prettier to ignore code, while an inaccurate expression of what I want it to do, is fine for now while another generation of Javascript tooling is produced.

Standard
Programming, Python

403 Forbidden errors with Flask and Zappa

One thing that tripped me up when creating applications with Zappa was an error I encountered with form posting that seems to have also caught out several other developers.

The tldr is that if you are getting a 403 Forbidden error but your application is working locally then you probably have a URL error due to the stage segment that Zappa adds to the URL of the deployed application. You need to make sure you are using url_for and not trying to write an absolute path.

The stage segment

Zappa’s url structure is surprisingly complicated because it allows you to have different versions of the code deployed under different aliases such as dev, staging and production.

When running locally your code doesn’t have the stage prefix so it is natural to use a bare path, something like flask.redirect(‘/’) for example.

If you’re using the standard form sequence of GET – POST – Redirect then everything works fine locally and remotely until the raw redirect occurs remotely and instead of getting a 404 error (which might tip you off to the real problem more quickly) you get a 403 forbidden because you are outside the deployed URL space.

If you bind a DNS name to a particular stage (e.g. app-dev.myapp.com) then the bare path will work again because the stage is hidden behind the CloudFront origin binding.

Always use url_for

The only safe way to handle URLs is for you to delegate all the path management and prefixing to Zappa. Fortunately Flask’s in-built url_for function, in conjunction with the Zappa wrapper can take care of all the grunt work for you. As long as all your urls (both in the template and the handlers) use url_for then the resulting URLs will work locally, on the API Gateway stages and if you bind a DNS name to the stage.

If this is already your development habit then great, this post is irrelevant to you but as I’ve mostly been using Heroku and App Engine for my hobby projects I’d found myself to be in the habit of writing the URLs as strings, as you do when you write the route bindings.

Then when the error occurred I was checking the URL against my code, seeing that they matched and then getting confused about the error because mentally I’d glossed over the stage.

Standard