Migrating Neo4J Python apps on Heroku

Okay this is quite specialised by the four or five of you who will have the same problem I wanted to save you some time and suffering.

So Neo Technologies have been incubating a plugin of their excellent graph database on Heroku for a while. So far the plugin was only available in beta but now anyone can have it. This is excellent news and I would recommend it as a way of getting starting with graph-based web programming. However if you were in the beta program then you now need to migrate from the beta plugin to the new one.

The instructions that went out to the beta program implied that this was simply a case of dumping a backup zip, switching out plugins and then uploading your zip. Well the good news is that exporting and importing the zips works exactly as advertised but the bad news is that the two plugins are quite different in terms of the environment they expose. The beta plugin had an extensive list of variables that exposed the various parts of your hosted environment. The new one just exposes the variable NEO4J_URL which is a url to the server and contains an embedded username and password.

Now the new variable does actually encode all the information that the original manifest did but in a very limited way and your library is going to have to work quite hard to correctly construct the base urls and requests required to access the REST API. I’m not sure which libraries do do it (I presume the Java ones) but neither of the Python ones do.

I’m going to describe what you need to do for the neo4j-rest-client which is the one I use in my apps but it will probably be similar for py2neo which is what you might want to use if you want to use a lot of Cypher.

So the simplest way to explain the solution is code.

import urlparse
from neo4jrestclient.client import GraphDatabase

def extract_credentials(url):
	parsed_url = urlparse.urlparse(url)

	if parsed_url.username and parsed_url.password:
		return (parsed_url.username, parsed_url.password)

	return None

GRAPH_URL = os.environ.get('NEO4J_URL', "http://localhost:7474") + "/db/data/"

credentials = extract_credentials(GRAPH_URL)

if credentials:
	db = GraphDatabase(GRAPH_URL, username = credentials[0], password = credentials[1])
	db = GraphDatabase(GRAPH_URL)

So the neo4j-rest-client library supports username and password credentials but doesn’t parse them out of the url itself. Fortunately urlparse makes this pretty trivial. The conditional pieces of the code deal with the situation where we are running locally, essentially if we can’t see the Heroku environment variables we want to fallback to the local case (most Heroku stuff works this way).

A more frustrating issue is the difference between the url of the server and the root resource for the REST API. Naturally these are not the same but few libraries handle been given the wrong url that gracefully. Since the host URL does return successfully you usually get some failure about parsing or unpacking the root document. Submitting a patch to detect where the url ends in db/data or not would seem to be the logical solution.

So this code should boot a successful wrapper around the REST interface and your app should work again.

Except, that there seems to be another issue in the registering and deregistering of the plugin manifests. What I have observed is that heroku config lists the beta environment variables and not the new values. So even if you do this the library still gets 404 errors on the root document (because it is looking for the Neo4J environment that has been deallocated).

So the best way to migrate your app in my view is:

  • go to your current app and download a database backup
  • create a new app with a temporary name (or something like my-app-2)
  • carry out your code changes as described above
  • load your new code into your instance
  • upload your backup into the new instance
  • if the app is working rename your old app (to something like my-app-old)
  • name your new app whatever your old app was called

This seems easier and less hassle than migrating in place. Once the beta plugin is turned off you should be able to delete the old app.

This process has allowed me to migrate my two demo apps successfully (pending bug reports): Crumbly Castle and Flow Demo.

Web Applications

The web is a graph

Last week I gave a talk on how I have been creating web applications that very lightly wrap an underlying graph to provide not just content for a page but also the workflow and state of the user’s current interaction with the application.

As part of the talk I have created two demo apps that are available on Heroku. Crumbly Castle is inspired by Dark/Demon Souls and allows you to explore a castle that is populated by the ghosts of everyone who has ever played it.  The other offers a questionnaire system that generates characters in the style of the Elder Scroll or Fallout games. The code for the applications is on Github so you can fork it and deploy it for yourself. Both use the hosted Neo4J addon for Heroku which provides hassle-free hosting but is currently only available to beta program members.

You can obviously use both on your local machine.

Both of the demos are metaphors for more serious kinds of enterprise applications but I think it is often easier to produce prototypes or demos that are based on immediately engaging concepts. It certainly helps to have something that the audience can play with during the talk!

So briefly I just wanted to summarise the points I try to make during the talk and explain why you might want to look at using a graph as your web application store. So my major point is that web application development is usually page-centric, when you hit a page the controller tends to examine the whole state of the application to find out why you came to the page. Are you logged in? Were you trying to look at something? Is there a session associated with you?

I posit that we should instead be looking at the journeys between the pages as being the interesting things. Given where you are in the journey graph where can you go next? Essentially I am taking the same logic as a state machine or rule engine uses and instead expressing it as a relationships in a graph.

The most common trick the applications use is to assign a fixed url to a user session that identifies a node in the graph. Then with each transition I change the relationships the node has to other data based on the user’s actions and then simply send a redirect back to the fixed url which will then render a different result based on the current state of the node.

This means that the web application becomes very simple to write and the controller simply has to select the template and the related nodes that are needed to generate links and actions.

I think it is a really interesting approach that is a really natural fit for simplifying a lot of session-state heavy apps.


Thoughts on Neo4J REST server 1.3

So the new Neo4J GPL release 1.3 (with added REST) is the release that kind of moves Neo4J out of its niche and into the wider arena. There are less barriers to experimenting with it in production situations and now it is no longer necessary to embed it you don’t have any language barriers, it can just be another piece of infrastructure.

Last week I used the new release to prototype an application and these are my observations. The big thing is that the REST server is easy to use out of the box and the API is well-thought out and practical and it looks like there’s some nice wrappers already for various languages (I used this one for Python).

However compared to other web-based stores there are still a few wrinkles. The first is that it would be nice to clearly be able to mark nodes that you consider “roots” in the graph. It’s almost the first thing you want to do and there should be a facility to support it in the API.

Indexing is still cronky, I think there is a reasonable expectation that Neo4J should be able to have a dynamic index that indexes the properties of nodes as they change without having to explicitly add the indexable fields to the index. I don’t mind if I need to do some explicit work to switch on this mode but having to manage it all explicitly is a huge pain. Particularly because you don’t have any root node support so you are almost inevitably going to want to search before moving over to relationship following.

It would be nice to have list and map properties on nodes, at the moment we’re getting around that by using JSON in the property string but again it feels like a value add that’s necessary if you want to use Neo4J for a big chunk of your data processing.

This is a personal peeve but I do think it would be nice to be able to override the id of a node. One usecase I can definitely think of is when you’re translating a document database into a graph form and you want to cross-reference the two (particularly if you’re using the richer data on the document rather than tranferring it into the node).

One other feature that is a bit strange (but I might be doing this wrong) is that the server seems to correspond to just one database. It would seem pretty easy to incorporate a database-style data-partition name into the REST scheme and allow one server to serve many different databases.

Now that the GPL version is available I am also looking forward to Neo4J entering the Linux package distributions in the same way that CouchDB has become a ubiquitous deployment choice.

So in short Neo4J has left it’s niche but it still hasn’t fully left the ghetto but I have high hopes for the next release.


Playing around with Neo4J and Groovy

After hearing about Neo4J at Ruby Manor I decided to have a play around with the graph database but for me playing doesn’t mean creating a whole Java project anymore. Since using Python/Ruby/Scala I want to be doing it in an interactive session.

I had quite a few issues getting Neo4J to run but the summary is that for convenience you want all three jars from the distribution in your classpath and you pass a String representing a directory path to the EmbeddedNeo constructor. Once you do have it running make sure to shutdown the database on an exception otherwise you will have to shutdown the JVM (i.e. close the whole Groovy Console session) to unlock the underlying file resources.

Okay so once you have the right jars you can now start playing around with Neo4J. I immediately felt that the library is actually quite heavy with a lot of ceremony to get things done. Some of the feedback I have been hearing from Neo Technology and the Neo4J list is that Neo4J is more of a low-level infrastructure component that is meant to be wrapped up in higher-level APIs.

Working with Groovy it should be possible to cut that ceremony down a bit and put a nicer front-end on things. The first thing I’ve tried is using closures to execute code in Neo database and transaction contexts.

If you have the three jars in your .groovy/lib directory you should be able to run this script from the Groovy Console and have it create a node in your directory. It will be the same node each time but I have some ideas for using builders for both nodes and traversers (which allow you to search the graphs) and I am going to work on (and post) them later.