One thing that has been really good about using Google Sites is the search feature. Initially I didn’t use it very much but as the amount of material I have been entering has grown it has become increasingly handy. The search feature has made up for the lack of a entry tree widgit. Once you’ve entered something it is easier to just search and jump to it.
Chris O’Dowd, Soho
Outside the Toucan bar, drinking Guiness and having a few fags, trying to nail his Irishman stereotype I guess. Tall and somewhat goofy, battering on about projects and what people are doing.
Trying to avoid shouting his catchphrase at him is torture.
Simon Amstell, Soho
Older in real life but still looking and talking like the quintessential posh sixth-former.
Java 6 brought low by zip files
I was on the verge of taking my work laptop in for a severe servicing with a cattleprod when I googled this bug and found my ire was poorly directed at my innocent Dell. Deleting all the install zip files I had downloaded from my Desktop found me back and running in reasonable time.
What a crazy bug! It is kind of incredible that it hasn’t been fixed yet.
Dean Lennox Kelly, Soho
Born in the same year as I. Expensive rough. Chatting on a cell phone and spinning around to break eye contact at the intersection.
Sun recruits Jython leads
I know this isn’t exactly breaking news but it is further evidence that Sun is aiming to turn the JVM in a language agnostic platform. It’s also good news for the Jython project which has suffered a long period of hibernation and which has fallen far behind CPython in terms of its compatibility.
I like Python as a language and its clarity is great for one time scripts (which never really are). I would really like it to be a full member of the JVM-compatibles.
Lenny Henry, Soho Square
Tall, square, in a suit so sharp you could slit your wrists on it. Talking about doubt and walking north.
Depressing times for Java programmers
Joshua Bloch of Effective Java and Collections fame has given a powerful and in many ways depressing talk. He has probably nailed the coffin shut on BGGA with this almost clinical dissection of the flaws of grafting Closures into Java. However he has also indicated how Java is effectively at the end of the line. Sun, for perfectly good reasons, wants to maintain backwards compatibility and not throw the kitchen sink into the language. The JCP does not have to stick with that approach but this talk does effectively say that there is a limit to what can be done while keeping the language recognisably Java.
The mention of Scala is significant because a lot of the things that people want in Java are available there and other features are in JRuby. Trying to create one language with universal appeal is going to be impossible. The reason I feel depressed about all of this is because I have working with Java for years now. I really had to fight to switch to 100% Java and now its time to move on again!
It would be easier to do that if there was something that was obviously better but currently all of the candidates for a potential successor have lacked that Eureka factor where you see something that is going to make programming easier and better and your working life a whole lot more fun. So far only Scala has really come close.
CouchDB: Pros and Cons
I have been looking at a lot of the different databases that are working outside the traditional SQL style RDBMS (and I want to point out that having a true RDBMS might actually be something very different to what we have now). I have spent a lot of time with CouchDB compared to the others because it seems to occupy a special niche in data storage. It also builds in a REST interface right off the bat which effectively decouples the storage mechanism from the delivery and makes the whole system very language neutral.
CouchDB represents the first serious contact I have had with JSON and in deciding whether CouchDB is right for your project you really need to understand what JSON does and doesn’t offer. The first important thing is that JSON offers a kind of compromise from the heavy data definition in SQL or XML Schema and totally typeless data of something like flat files or Amazon SimpleDB. However to achieve that simplicity there are two consequences. Firstly data integrity is palmed off onto client applications and you will need to check data going in and coming out. I don’t personally believe that there is ever an application that doesn’t have some explicit or implicit data integrity. Second JSON documents can be very rich and complex but they cannot have detailed field structures, a telephone number is simply going to decompose into two numbers and its hard to get much finer grain description of the data. As a rule of thumb if you would normally create a well-formed XML document for the data or you would define a SQL data definition that would simply declare all the types of the column using all the defaults and never specifying a NOT NULL column then you probably have a good match.
JSON also favours dynamic languages over static types like Java. You need to decide if you are going to be able to take advantage of the full set of features in your chosen client language. XML might remain a better choice for static languages as you describe document instances declaratively.
Other good features about CouchDB include the fact that its incubating for Apache and will be a good addition to the projects there. It also leverages existing Javascript skills which are probably more common than people who know XPath, XQuery or even XML Schema and Relax NG. It has excellent support for incremental data and also fits well with highly irregular data like CMS page content. It also has an excellent UI that makes it easy to interact with the data. Finally it seems to have a good solution for scalability without doing anything too esoteric.
The negative features are tricky because we obviously have an alpha project here that is probably closer to the start of the public testing phase than the end. Some of these cons may well be addressed before the first release candidate. The first obvious omission for a server product is that there is no security built into the server. Just supporting optional HTTP Authentication would be enough to make it practical to start running some experimental servers for something more than sandbox exercises. One thing that is a major difference to the feature set in the standard pseudo-RDBMS is that there is no way of interacting with sets of data. There is a method for changing multiple documents in one pass but what you really want to be able to do is apply a set of changes to documents identified by a query, the equivalent of SQL’s UPDATE.
Related to this is the fact that data in CouchDB is currently heavily based on silos. If I have a set of data referring to authors and a set of data relating to books then I currently need to duplicate data in both. One of the problems that the developers are going to face in evolving CouchDB is how to address this without introducing a solution so complex that you ask why you are not using an RDBMS? Similarly I notice in the roadmap there is an item about data validation. If you start introducing data validation and rules for validating data then before long there is going to be a question as to why you don’t simply use one of the existing document systems as all the current simplicity will have gone.
One thing that definitely needs improvement is error logging and reporting. Often the only error feedback you have is a Javascript popup that says “undefined” and a log message that tells you that the Erlang process terminated. There needs to be some more human-readable issue logging that points you towards what is going wrong.
CouchDB: Querying data
CouchDB allows you to pass a map function to a special view URL to query the data in an ad-hoc way. Views can also be stored as JSON documents with a convention URL (_design on the server, accessed as _view by the client). These can then be obtained via a HTTP request.My functional and Javascript programming are weak but this is what I understand of writing queries in CouchDB. Let’s take an example of a set of library cards, each card represents a book but the amount of information I have on each book varies.
The basic find all function is this:
function(book) {
map(null, book)
}
This defines an anonymous function that takes one parameter, the target document, in this case a book, and returns an array of values. What is in the value list is controlled by the second parameter, in this query I return the entire document. The first parameter controls the sorting or ordering. So I wanted to return the title of all the books in my database then I would use:
function(book) {
map(book.name, book.name)
}
Sorting them by ISBN would go like this:
function(book) {
map(book.isbn, book.name)
}
One important thing to note is that if an object doesn’t have a value it doesn’t respond to the function and will not be included. So if I created some of my entries with a value title instead of name anything with a title and not a name will not be in the query. However if I use a non-existent entry as an ordering criteria the value will count as null and be sorted.
Because I can include any valid Javascript in my function I can actually put a lot of complexity into my queries. For example:
function(book) {
if(book.isbn != null) { map(book.name, {"Name": book.name, "ISBN": book.isbn})
} else { map(book.name, book.name) }
}
So I suspect this will either make you cheer or puke. What this function does is return a JSON object containing the Name and ISBN of the book if they are known or just the Book name as a String otherwise. Unlike SQL the heading of my query is almost completely arbitrary as long as the value on the right of my map function translates to a valid JSON object.
Now at work there are often a lot of debates as to whether things are “rigid” or “structured” or whether they are “flexible” or “formless”. It is a bit like the old meat and poison adage. CouchDB allows a client to construct an almost arbitrarily rich response to a query with almost no restriction on how the data that should be included in that response. In some cases this is going to allow you to easily interact with very complex unstructured data in some cases it is going to be an invitation to create a sprawling dataset with no value. There is no inherent right or wrong choice here but for a particular problem being solved there is probably going to be a wrong and right choice. SQL is powerful because of the restrictions and rules it builds into its grammar. Using Javascript is powerful because it relaxes those restrictions. Programmers and IT folks in general often fall into using the laxest possible implementation for reason of “flexibility” but then either have to impose order themselves or lose the power of the more restrictive choice.
So putting that into a concrete example, if a write a view with SQL I am going to have to follow a set of rules to get the data I want (for example my heading is going to have to be a set of tuples of equal size), using an arbitrary script and JSON means I am going to be able to get exactly the data I want in the form I want it. However since that return structure is customised to my query I might possibly be reducing my reuse by being over-specific or by building too much logic into my view code.
That’s quite a diversion just so I can say it’s horses for courses, so let’s wrap up this quick look at CouchDB views. All of CouchDB’s views are effectively JSON objects that are passed to a separate view server. This is a separate process that interacts with the main server via STDOUT and STDIN pipes. By default this is the view server that is built from the Spidermonkey library (it is called couchjs). However you can write a view parser for any language and plug it into CouchDB by creating an executable and mapping it to a MIME type in the couch.ini file. The view server essentially parses and readies the query function that is associated with the view and is then sent every document in the database as a JSON string. The view server picks up the results of reading every document and sends that back to the query request.
It is a pretty simple system and it works will for the relatively flat documents I have been trying with it. However I suspect that in a project with multiple developers some ground-rules for writing consistent query code would be a must.