Programming

Migrating Neo4J Python apps on Heroku

Okay this is quite specialised by the four or five of you who will have the same problem I wanted to save you some time and suffering.

So Neo Technologies have been incubating a plugin of their excellent graph database on Heroku for a while. So far the plugin was only available in beta but now anyone can have it. This is excellent news and I would recommend it as a way of getting starting with graph-based web programming. However if you were in the beta program then you now need to migrate from the beta plugin to the new one.

The instructions that went out to the beta program implied that this was simply a case of dumping a backup zip, switching out plugins and then uploading your zip. Well the good news is that exporting and importing the zips works exactly as advertised but the bad news is that the two plugins are quite different in terms of the environment they expose. The beta plugin had an extensive list of variables that exposed the various parts of your hosted environment. The new one just exposes the variable NEO4J_URL which is a url to the server and contains an embedded username and password.

Now the new variable does actually encode all the information that the original manifest did but in a very limited way and your library is going to have to work quite hard to correctly construct the base urls and requests required to access the REST API. I’m not sure which libraries do do it (I presume the Java ones) but neither of the Python ones do.

I’m going to describe what you need to do for the neo4j-rest-client which is the one I use in my apps but it will probably be similar for py2neo which is what you might want to use if you want to use a lot of Cypher.

So the simplest way to explain the solution is code.

import urlparse
from neo4jrestclient.client import GraphDatabase

def extract_credentials(url):
	parsed_url = urlparse.urlparse(url)

	if parsed_url.username and parsed_url.password:
		return (parsed_url.username, parsed_url.password)

	return None

GRAPH_URL = os.environ.get('NEO4J_URL', "http://localhost:7474") + "/db/data/"

credentials = extract_credentials(GRAPH_URL)

if credentials:
	db = GraphDatabase(GRAPH_URL, username = credentials[0], password = credentials[1])
else:
	db = GraphDatabase(GRAPH_URL)

So the neo4j-rest-client library supports username and password credentials but doesn’t parse them out of the url itself. Fortunately urlparse makes this pretty trivial. The conditional pieces of the code deal with the situation where we are running locally, essentially if we can’t see the Heroku environment variables we want to fallback to the local case (most Heroku stuff works this way).

A more frustrating issue is the difference between the url of the server and the root resource for the REST API. Naturally these are not the same but few libraries handle been given the wrong url that gracefully. Since the host URL does return successfully you usually get some failure about parsing or unpacking the root document. Submitting a patch to detect where the url ends in db/data or not would seem to be the logical solution.

So this code should boot a successful wrapper around the REST interface and your app should work again.

Except, that there seems to be another issue in the registering and deregistering of the plugin manifests. What I have observed is that heroku config lists the beta environment variables and not the new values. So even if you do this the library still gets 404 errors on the root document (because it is looking for the Neo4J environment that has been deallocated).

So the best way to migrate your app in my view is:

go to your current app and download a database backup
create a new app with a temporary name (or something like my-app-2)
carry out your code changes as described above
load your new code into your instance
upload your backup into the new instance
if the app is working rename your old app (to something like my-app-old)
name your new app whatever your old app was called

This seems easier and less hassle than migrating in place. Once the beta plugin is turned off you should be able to delete the old app.

This process has allowed me to migrate my two demo apps successfully (pending bug reports): Crumbly Castle and Flow Demo.

Web Applications, Work

The myth of “published” content

Working at the Guardian you often end up having conversations with people about the challenges you face in scaling to meet the often spiky traffic you get in online media. One thing that comes up again and again is the idea that content, once published is essentially static. Now there is a lot to be said for this as digital journalism sticks pretty close to a lot of the conventions of print media; copy is often culled from the print version and follows the 24 hour media cycle quite strongly.

However what is often surprising is the amount of edits a piece of content receives, particularly if it is not a print feature article. The initial version of an article is often the mandatory information and a few paragraphs sufficient to get across the basic story. It then goes through a number of revisions that often happen while the article is draft. Often but not always.

Once the article gets published online though it triggers a new wave of edits as language gets cleaned up and readers, editors and lawyers all descend on it. Editors now have a lot more tools to see what the reaction of the audience to a piece of content is and see how it is playing in social media. You also have articles picked up externally and that means making sure the article works as a landing page.

Naturally stories often develop their own momentum that requires you to switch from a single piece to a set of stories that are approaching different aspects of the overall reporting. You then need to link the different pieces of content together to form a logic package of content.

One thing that is interesting is looking at how many articles are changed after seven days. It is a surprising number as new stories often create a need to create a historic context and often historical stories look dusty in the light of breaking events. We have also had strange things happen with social news where aggregating sites pick up some story that was overlooked at the time.

All of this means that you cannot naively treat content as static but in fact means that you have an interesting decaching problem as it is true that content doesn’t change much, until it does start changing and then it needs to reflect the changes reasonably rapidly if you want to be picked up by things like Google.

Work

The BBC “across”

The term “across” is absolutely endemic at the BBC and because so many people in UK media pass through the BBC it also crops up across the sector generally. Although I was initially scornful of it as a term I have long since caved to the inevitable and use it as well.

Being “across” seems to have originated in the fact that the BBC have multiple media streams and when a journalist talks about being “across” things they might well mean that they are producing pieces on a topic across multiple media, say television and radio. It might also mean that they are tracking a breaking story and are watching other media outlets for what they are saying about a story.

Outside this context though the word more or less means “understanding”. So when someone from the BBC is “across” something it means they understand it, sometimes if they are “across” it enough they can also make decisions about it. When someone isn’t “across” something then they do not feel they understand it or they are unprepared to answer questions about it.

Ironically this meaning then seems to seep back into broadcast journalism and I have heard journalists on air saying that they are “across” developments such as the formation of the coalition.

At this point “across” feels kind of ubiquitous apart for people who have been raised in single stream media so I think it’s worth you being “across” it too.

Scala

What’s the problem with Scala’s “lazy” vals?

Scala has a value modifier called “lazy” and I have a bit of a problem with it. Let’s start with the name, lazy is used to indicate that the value is not defined when declared but when the value is read. Really it isn’t lazy at all it is actually a memoisation wrapper around the value.

Now that’s a pretty handy thing and there are definitely some use cases for this. In particular where the calculation of the value is truly invariant and expensive. Some people suggest that it can be used to cache the results of service or database lookups but that is clearly madness unless the query really does return an invariant.

However I think that there is a very subtle problem with the misapplication of lazy and that is that it introduces state into an otherwise pure function, namely whether the value has been calculated or not and what the value is when calculated. This state to me actually reduces the utility of a pure function in terms or reuse as the consumer has to be very aware in time of when the function is going to be invoked and what its value is going to be over time.

It’s rare that lazy over-application goes wrong because I mostly see it in short-lived instances. However that very lifespan seems to suggest that there is no real value in lazy evaluation over eager declaration. After all if the function never gets called then it doesn’t cost anything and true constants should be in the companion object. In long-lived objects though I think there is serious potential for lazy to go wrong and therefore it should be applied sparingly where a positive impact can be proved.

Clojure

London Clojure Maze solver dojo

Last month we had another team code competition, this time centered around writing code that trys to solve a maze. Clojure seems quite apt for creating these kind of challenges as it has a lot of support for dynamic code evaluation and the functional paradigm makes writing callbacks a lot easier.

Just like the Battleships dojo it was interesting in that the random strategy was a good local maximum. However one revalation that the maze wasn’t cyclic later then left-wall hugging was kicking everyone ass. That then left dead-end elimination as the only possible way to produce a faster solver. Which our team failed to do sadly. Right idea, wrong turning table.

We also got bogged down on a Clojure issue which has come up a few times at the dojo. I’ll summarise it here: should you be using Clojure 1.4? Your library syntax and server compatibility depends on the answer to this and there is no good error message that is going to tell you that the language syntax has changed.

The competitive dojo is an interesting environment where only the best work process and most pragmatic code can thrive. It is an interesting critique of hammock-style as the result of all thinking and beard-stroking better be order of magntiudes better than the obvious answer.

We also got to see a good example of beard-stroking abstraction this month with Chris Ford’s introduction to the theory of music and its abstractions in a general purpose computing language. An amazing talk which combined education with an amazing abstraction over music itself.

Clojure

Clojure: does a map contain all the specified keys?

An interesting problem came up this week during some Clojure batch work. How do you cleanly say that a map contains a set of given keys? The context is that during batch processing you only want to perform some operations if earlier operations have populated the right data set.

There’s probably some neat trick or built-in function but this is what I’ve come up with. I quite like the mapping of the count, it would look even better if I didn’t have to apply the equals but it’s not bad.

(defn has-keys? [m keys]
  (apply = (map count [keys (select-keys m keys)])))

(def map-data {:a 1 :b 3})

(has-keys? map-data [:a :c]) ; false

(has-keys? map-data [:a :b]) ; true

Work

Success looks like success

One useful thing I have learnt by working in an early stage startup is that success looks and feels like success. If you are doing something and it does not seem like it is being successful then it isn’t. This might seem like something that is completely obvious but it is really not.

There are many kinds of “not success” (or failure to give it it’s true but cruel name). The worst is the “almost success” where something works and delivers on its promise, but not very well. Almost success creates this dilemma where perhaps with a little iterating and more effort you could turn it into a real success.

When this kind of success creates ongoing liabilities in terms of customer expectations then the situation is even worse. If you try and cancel things or do a lean startup pivot then you are guaranteed to alienate your current customers with only the hope of gaining more of the true customer base you originally envisaged.

Weak success is a little better, weak success is not outright failure but is so unsuccessful that people completely understand if you want to knock it on the head.

In a large organisation though things are far more difficult. Both almost success and weak success are ironically more dangerous in a big and profitable organisation due to two factors: personal reward and hidden cross-subsidy.

If someone achieves any degree of success in an organisation they expect to be rewarded for it. Perhaps justifiably, perhaps not. Either way there are serious morale implications if at the end of period of exertion by any group or individual you kill off the object of all their efforts, no matter how rational and correct that decision may be.

In general people are conflict averse so they prefer to reward almost success and move on to other activities that might be more successful.

However a set of almost successful activities all have real costs and the minute any of them fail to generate revenue sufficient to carry their costs then you are in the world of the hidden cross-subsidy where all your almost successful projects and products start to drag down any aspect of the business that is profitable.

It is a tar pit that can be difficult to escape unless you have really good accounting to see where money is coming and going. It is only easy in a startup because generally you only have one product and therefore all profit and loss is easy to tally and attribute.

So everything in your business that is not a success is a potentially business-killing failure. And for that reason, even though it is hard, you need to end almost success just as much as you need end failure.

The question to ask is not “Is this successful?”; if you are asking that question then the answer is simply no. If you are successful then the question you ask is “How are we going to deal with all this success?”.

Clojure

EuroClojure Day 2

Okay so this post maybe happening a little later than Friday but in my defence there were some excellent conversations to go with the after-conference drinks.

Day 2 featured two talks by Rich Hickey, I had already seen some of the Datomic stuff from QCon and the web so I found the stuff on the new reducers library more engaging. I have never thought of map having an implicit ordering promise.

Meikel Brandmeyer gave a historical review of lazy seq which was really helpful for understanding laziness (something I have a bit of a problem with). One of the real highlights though was Chris Ford’s talk about canon music. It started with a good gag about sheet music being a DSL for using the finite state machine otherwise known as a musician. However the really amazing thing was Chris’s abstraction of the score and subsequent transformations of the abstract score to end up with variations on the base canon he had chosen. Really amazing. Chris’s talk really shouldn’t have been a lightning talk, it is about the only quibble I had with the programming.

Sam Newman also had an excellent closing line in his lightning talk on Riemann, which was if people want Clojure to be adopted widely then the secret is to create great things with Clojure.

Java

EuroClojure 2012 Day 1

So there were definitely two big themes in the talks on the first day of the conference.

The first has been about how to use event-based systems to create flexible aggregate data models. All speakers seem to have settled on a reduce or foldLeft approach for creating the aggregate but there have been two models put forward already CQRS and a kind of Aggregate query bus but really it seems that responsibility for accepting event data and allow querying and access to aggregated views seem to be responsibilities in the same system.

The other thing has been creating query systems using logical predicates. The were no less than three generic query systems put forward: core.logic for low-level flexible implementations that identify either data or results and two general query libraries: one from Datomic and the other from Cascalog.

Web Applications

The web is a graph

Last week I gave a talk on how I have been creating web applications that very lightly wrap an underlying graph to provide not just content for a page but also the workflow and state of the user’s current interaction with the application.

As part of the talk I have created two demo apps that are available on Heroku. Crumbly Castle is inspired by Dark/Demon Souls and allows you to explore a castle that is populated by the ghosts of everyone who has ever played it. The other offers a questionnaire system that generates characters in the style of the Elder Scroll or Fallout games. The code for the applications is on Github so you can fork it and deploy it for yourself. Both use the hosted Neo4J addon for Heroku which provides hassle-free hosting but is currently only available to beta program members.

You can obviously use both on your local machine.

Both of the demos are metaphors for more serious kinds of enterprise applications but I think it is often easier to produce prototypes or demos that are based on immediately engaging concepts. It certainly helps to have something that the audience can play with during the talk!

So briefly I just wanted to summarise the points I try to make during the talk and explain why you might want to look at using a graph as your web application store. So my major point is that web application development is usually page-centric, when you hit a page the controller tends to examine the whole state of the application to find out why you came to the page. Are you logged in? Were you trying to look at something? Is there a session associated with you?

I posit that we should instead be looking at the journeys between the pages as being the interesting things. Given where you are in the journey graph where can you go next? Essentially I am taking the same logic as a state machine or rule engine uses and instead expressing it as a relationships in a graph.

The most common trick the applications use is to assign a fixed url to a user session that identifies a node in the graph. Then with each transition I change the relationships the node has to other data based on the user’s actions and then simply send a redirect back to the fixed url which will then render a different result based on the current state of the node.

This means that the web application becomes very simple to write and the controller simply has to select the template and the related nodes that are needed to generate links and actions.

I think it is a really interesting approach that is a really natural fit for simplifying a lot of session-state heavy apps.

Echo One

Sequentially arranged sentences composed of words (and punctuation)

The myth of “published” content

The BBC “across”

What’s the problem with Scala’s “lazy” vals?

London Clojure Maze solver dojo

Clojure: does a map contain all the specified keys?

Success looks like success

EuroClojure Day 2

EuroClojure 2012 Day 1

The web is a graph