Software, Work

What’s the difference between Simple and Stupid in interview code?

Kent Beck (I believe) said: “Do the simplest thing that works, not the stupidest”. What does that mean in the context of showcase code (i.e. code you have written as part of some kind of application process)?

Well firstly think of what the point of showcase code is. What is the reviewer looking for as part of the application process? They are looking for evidence of a train of thought or method of problem solving that chimes with the kind of thought processes that they think are effective. Ideally these would be the “best” processes but you have to accept that a lot of processes are about finding people who agree with what an organisation thinks is best practice rather than respecting originality and clever solutions. You should never try to tailor your showcase code to an organisation because it is too hard to double guess what someone is looking for. You can only express yourself as clearly as you can and hope that it fits well with what is being sought.

A good piece of showcase code will appropriately abstract the problem being set, illustrating the way the candidate finds solutions to problems. It will neither be too elaborate, abstracting too much of the problem, nor too literal, coding “between the lines” of the problem being set.

Let’s take an example. If the problem is to model a set of automobile configurations in an object then the problem set may explain that a car is sold with seat configurations of 2, 4 or 6. In my view a stupid solution would be to have a list of just 2, 4, 6. Clearly the reviewer is going to want some flexibility here, the solution should be able to deal with a new configuration of 5.

An obvious solution is to just model the number of seats as an integer value. You can encapsulate the data by providing an API that allows you query how many seats a configuration has and provide a way of setting that value in an immutable way (once a configuration is set then the number of seats it has is unlikely to change in the lifetime of the object, that’s simplicity not stupidity). This solution just exposes that a configuration has a number of seats which is an integer, which is logical. It is an entirely sensible solution that keeps it simple.

How might a nervous candidate overcomplicate this situation? Well the number of seats really is immutable unless the problem set says otherwise, you don’t need to supply a mutator. Similarly the number of seats is never going to be a Real number.

What about modelling the number of seats as a list of Seat objects? Well yes that works because the encapsulation should hide the implementation and the number of seats is now just the number of Seats in the Seat list. Domain-driven design fans might even say this is a better solution that an integer because it is a closer description of the domain. However I think that simplicity would tend to demand that until we have something else to model about a Seat (say, whether it has a seatbelt) then it is hard to justify having a class whose only purpose is to be counted in a list. For all the difference it makes I could put Turtle objects into the Seat list. In terms of showcase code using this technique actually gives the reviewer a bit of a headache because instead of just ticking the box and moving on the reviewer now has to ask whether the new level of abstraction introduced is valid or not, is it a good idea to have a class with no functionality or data? What if actually the seat configurations are better represented by enumerated constants? The effort of creating the Seat class is just wasted and has to be replaced.

This solution at least stays within the bounds of an existing paradigm. If a candidate starts to abstract wildly then reviewer is going to give up in frustration. If the candidate starts abstracting the model to the point where it could model a motorcycle or truck just as well as car then they have just failed the process. Unless the problem says something about having a motorcycle then you don’t want to see Vehicle With An Engine type classes. After all if you start to generalise then where do you stop? Sure you can model a motorcycle and a truck but why not a plane? Or a rocket?

Standard
Java, Python, Scripting

Sun recruits Jython leads

I know this isn’t exactly breaking news but it is further evidence that Sun is aiming to turn the JVM in a language agnostic platform. It’s also good news for the Jython project which has suffered a long period of hibernation and which has fallen far behind CPython in terms of its compatibility.

I like Python as a language and its clarity is great for one time scripts (which never really are). I would really like it to be a full member of the JVM-compatibles.

Standard
Java, Programming

Depressing times for Java programmers

Joshua Bloch of Effective Java and Collections fame has given a powerful and in many ways depressing talk. He has probably nailed the coffin shut on BGGA with this almost clinical dissection of the flaws of grafting Closures into Java. However he has also indicated how Java is effectively at the end of the line. Sun, for perfectly good reasons, wants to maintain backwards compatibility and not throw the kitchen sink into the language. The JCP does not have to stick with that approach but this talk does effectively say that there is a limit to what can be done while keeping the language recognisably Java.

The mention of Scala is significant because a lot of the things that people want in Java are available there and other features are in JRuby. Trying to create one language with universal appeal is going to be impossible. The reason I feel depressed about all of this is because I have working with Java for years now. I really had to fight to switch to 100% Java and now its time to move on again!

It would be easier to do that if there was something that was obviously better but currently all of the candidates for a potential successor have lacked that Eureka factor where you see something that is going to make programming easier  and better and your working life a whole lot more fun. So far only Scala has really come close.

Standard
Programming

CouchDB: Pros and Cons

I have been looking at a lot of the different databases that are working outside the traditional SQL style RDBMS (and I want to point out that having a true RDBMS might actually be something very different to what we have now). I have spent a lot of time with CouchDB compared to the others because it seems to occupy a special niche in data storage. It also builds in a REST interface right off the bat which effectively decouples the storage mechanism from the delivery and makes the whole system very language neutral.

CouchDB represents the first serious contact I have had with JSON and in deciding whether CouchDB is right for your project you really need to understand what JSON does and doesn’t offer. The first important thing is that JSON offers a kind of compromise from the heavy data definition in SQL or XML Schema and totally typeless data of something like flat files or Amazon SimpleDB. However to achieve that simplicity there are two consequences. Firstly data integrity is palmed off onto client applications and you will need to check data going in and coming out. I don’t personally believe that there is ever an application that doesn’t have some explicit or implicit data integrity. Second JSON documents can be very rich and complex but they cannot have detailed field structures, a telephone number is simply going to decompose into two numbers and its hard to get much finer grain description of the data. As a rule of thumb if you would normally create a well-formed XML document for the data or you would define a SQL data definition that would simply declare all the types of the column using all the defaults and never specifying a NOT NULL column then you probably have a good match.

JSON also favours dynamic languages over static types like Java. You need to decide if you are going to be able to take advantage of the full set of features in your chosen client language. XML might remain a better choice for static languages as you describe document instances declaratively.

Other good features about CouchDB include the fact that its incubating for Apache and will be a good addition to the projects there. It also leverages existing Javascript skills which are probably more common than people who know XPath, XQuery or even XML Schema and Relax NG. It has excellent support for incremental data and also fits well with highly irregular data like CMS page content. It also has an excellent UI that makes it easy to interact with the data. Finally it seems to have a good solution for scalability without doing anything too esoteric.

The negative features are tricky because we obviously have an alpha project here that is probably closer to the start of the public testing phase than the end. Some of these cons may well be addressed before the first release candidate. The first obvious omission for a server product is that there is no security built into the server. Just supporting optional HTTP Authentication would be enough to make it practical to start running some experimental servers for something more than sandbox exercises. One thing that is a major difference to the feature set in the standard pseudo-RDBMS is that there is no way of interacting with sets of data. There is a method for changing multiple documents in one pass but what you really want to be able to do is apply a set of changes to documents identified by a query, the equivalent of SQL’s UPDATE.

Related to this is the fact that data in CouchDB is currently heavily based on silos. If I have a set of data referring to authors and a set of data relating to books then I currently need to duplicate data in both. One of the problems that the developers are going to face in evolving CouchDB is how to address this without introducing a solution so complex that you ask why you are not using an RDBMS? Similarly I notice in the roadmap there is an item about data validation. If you start introducing data validation and rules for validating data then before long there is going to be a question as to why you don’t simply use one of the existing document systems as all the current simplicity will have gone.

One thing that definitely needs improvement is error logging and reporting. Often the only error feedback you have is a Javascript popup that says “undefined” and a log message that tells you that the Erlang process terminated. There needs to be some more human-readable issue logging that points you towards what is going wrong.

Standard