Programming

Google Chrome wants to sit at the table

I fixed a weird CSS bug last week which I think is an interesting illustration of the difficulties of implementing a standard. One of the ways that I layout our website is by using CSS Tables to provide right-sidebars and content splits. One of our components consists of an image in a left-column, content and then a button bar in a right-column.

The whole strip of content was marked as display: table-row and the columns were display: table-cell.

On Firefox this was fine and the full width of the page was used with the correct proportions for the content blocks. On Chrome however the width of the whole component was around 75% of the whole page width (with the correct proportions). After a lot of examination it appeared that Chrome genuinely felt the page was narrower at that point.

It took a few weeks to get the necessary mindshift to fix it. The problem was the interpretation of how display: table-* works when you don’t have the full hierarchy. My interpretation is that the browser fills in the implied parent elements and that worked correctly in Firefox. In Chrome however I needed to change my table-row to table for the full width of the page to be used. The implied table wasn’t the full width of the page, while the explicit one was with the implied table-row occupying the correct full width.

This worked fine on Firefox as well so the problem was solved! Always have a top-element that defines display: table.

Programming

I need feedback not features

“How about LDAP? Or a configurable workflow?” opines the new salesman. “I think it would really help the product if it had help videos” chirrups the new intern.

All interesting ideas I’m sure and some of them might be relevant in the future, but to be honest I didn’t really need a lover here I just needed a friend.

A lot of people seem to confuse feedback with the opportunity to point out the depressingly long list of things that an application doesn’t have, doesn’t have completely or just doesn’t do. As a developer it often feels that there is a never-ending queue of people ready to point out that the kettle washes clothes really badly.

Offering new features is actually a substitute for genuine feedback, which for me is discussing what the application does do on its own merits. In particular one thing that people never seem to be good at is to point out how existing features can be leveraged in multiple contexts or in different ways to maximise their benefit. Charles Vasey, a boardgame designer is amazing at doing this kind of refinement and a big inspiration here. Instead I seem to be left to sweat the codebase on my own while people boot more and more stuff into the backlog (blithely ignoring their own self-defeating behaviour).

A natural instinct (and one I’ve been succumbing to recently) is just to ignore everything everyone says unless it is a bug report. One particularly strong point in favour of doing this is that since I have a backlist that goes into October why not wait until that is clear and then gather feedback? I too already have a massive list of features I wish I could add but have discounted on their lack of value, surely I would be better off implementing them first instead of soliciting feedback on something that is incomplete.

However realistically, retreating into the Ivory Tower is rarely the best option. But how can we move from features to feedback? I think perhaps the first step is to call out the behaviour and just park each new feature onto a Post-It and then re-iterate the need for feedback. The other idea is make people explicitly say what they like and then why they like it. Forcing to people to focus on the positive and make them express their response in terms of what they want to see more of rather than what they would like to see.

Programming, Python

Django and JSON stores: a match in heaven

My current project is using CouchDB as its store and Django to implement the web frontend. When you have a JSON store such as CouchDB then Python is a natural complement due to its brilliant parsing of JSON into native data structures and its powerful dictionary data type that makes it really easy to work with maps.

In a previous project using Python and Mongo we used Presentation objects to provide domain logic on top of the raw data maps but this time around I wanted to try and cut out a layer and just work with maps as much as possible (perhaps the influence of the Clojure programming I’ve been doing on the side).

However this still leaves two problems that need addressing. Firstly Django templates generally handle dictionaries like a dream allowing to address them with the standard dot syntax. However both Mongo and Couch use leading underscores to indicate “special” variables and this clashes with the Python convention of having a leading underscore indicate a private member of the class. The most immediate time you encounter this is when you want to use the id of a document in a url and the naive doc._id does not work.

The other problem is the issue of legitimate domain logic. In Wazoku we want to use people’s names if they have supplied them and fallback to their email if they haven’t supplied their name.

The answer to both of these problems (without resorting to an intermediary object) is Django’s filters. The necessary logic can be written in a few lines of Python that simply examines the dictionary and does the necessary transformation to derive the id or user’s name. This is much lighter than the corresponding Presentation solution.

Programming

Thoughts on Neo4J REST server 1.3

So the new Neo4J GPL release 1.3 (with added REST) is the release that kind of moves Neo4J out of its niche and into the wider arena. There are less barriers to experimenting with it in production situations and now it is no longer necessary to embed it you don’t have any language barriers, it can just be another piece of infrastructure.

Last week I used the new release to prototype an application and these are my observations. The big thing is that the REST server is easy to use out of the box and the API is well-thought out and practical and it looks like there’s some nice wrappers already for various languages (I used this one for Python).

However compared to other web-based stores there are still a few wrinkles. The first is that it would be nice to clearly be able to mark nodes that you consider “roots” in the graph. It’s almost the first thing you want to do and there should be a facility to support it in the API.

Indexing is still cronky, I think there is a reasonable expectation that Neo4J should be able to have a dynamic index that indexes the properties of nodes as they change without having to explicitly add the indexable fields to the index. I don’t mind if I need to do some explicit work to switch on this mode but having to manage it all explicitly is a huge pain. Particularly because you don’t have any root node support so you are almost inevitably going to want to search before moving over to relationship following.

It would be nice to have list and map properties on nodes, at the moment we’re getting around that by using JSON in the property string but again it feels like a value add that’s necessary if you want to use Neo4J for a big chunk of your data processing.

This is a personal peeve but I do think it would be nice to be able to override the id of a node. One usecase I can definitely think of is when you’re translating a document database into a graph form and you want to cross-reference the two (particularly if you’re using the richer data on the document rather than tranferring it into the node).

One other feature that is a bit strange (but I might be doing this wrong) is that the server seems to correspond to just one database. It would seem pretty easy to incorporate a database-style data-partition name into the REST scheme and allow one server to serve many different databases.

Now that the GPL version is available I am also looking forward to Neo4J entering the Linux package distributions in the same way that CouchDB has become a ubiquitous deployment choice.

So in short Neo4J has left it’s niche but it still hasn’t fully left the ghetto but I have high hopes for the next release.

Programming

Many cores, many threads?

There are roughly two approaches to solving the multicore problem: better ways of managing threads and making multiple processes easier to invoke and manage. There are a few pros and cons, threads for example are going to easier to work with if you are managing shared data or you have expensive resource creation costs that you would prefer to confine to a single initialisation event rather than going through a version of it for every job you want to create.

However if you can make your job a parallel process, avoid the need for co-ordination and have a lightweight spin-up then there is actually an interesting question as to whether you should favour processes rather than threads to boost throughput.

Recently I have been working a lot with various REST-like webservices and these have some interesting idempotent and horizontally scalable properties. Essentially you can partition the data and at that point you can execute the HTTP requests in parallel.

I’ve used two techniques for doing this, the most general is the use of GNU Parallel which is an awesome tool (which I would recommend you incorporate into your general shell use) combined with Curl which is also awesome. The other is using Python’s multiprocessing library, which was very helpful when I actually wanted to use a datasource like MongoDB to generate my initial dataset. Even here I could make a system call to Curl if I didn’t want to use Python’s Httplib.

So why use processes to do parallel work rather than threads? The first answer is very UNIX-specific in that processes actually get a tremendous amount of OS support for managing and running, compared to tools for analysing thread activity and usage.

The second is conceptual, the creation of a process to do a task is involves a simple lifecycle and the separation between the resources used by processes is absolute compared to the case with threads. Once a process has gone you know it has completed either successfully or unsuccessfully and upon completion that process no longer affects any other running process.

The third is practical, I think it is easier to construct pipelines of work that divide into parallel streams through a pipeline approach rather than trying to put that code into a thread manager.

So if the conditions apply take a good look at the multi-process approach because it might be a lot easier to implement.

Programming

Intuitive versus Reasoning Programmers

During the last year I’ve been helping run a monthly series of dojos for the London Clojure User Group. In the course of it I have had the chance to watch a lot of people grapple with functional programming. As a result of this and also looking at the way a lot of my colleagues at ThoughtWorks work I think programmers can roughly be divided into two groups: Intuitive and Reasoning.

To characterise both of them a little, I think that Intuitive programmers tend to use domain language a lot, rely heavily on tests and TDD, often find it difficult to articulate what they are doing in their work, they like small source code files because they like to see everything a file does at a glance, they prefer outside-in problem solving and like the opportunity to go back and revise their work.

Reasoning programmers like to discuss a problem before coding and explore edge-cases and what-ifs, they like to work in the REPL or have in-editor code evaluation, they quickly move from a domain to an abstraction and then work in that abstraction, they don’t mind a thousand line source file as long as it is logically structured (that’s what the search function is there for), they prefer “bottom-up” coding where they distil their abstraction to its essence and having implemented that create the required behaviour by composing their abstractions.

Both sets of programmers can be great at their work but often if they are unaware of the characteristics of how they like to work then there can be massive amounts of tension as the two wrestle back and forth. The Intuitive developers feel angsty when the Reasoning programmers start hacking out code rather than refactoring, Reasoning developers get frustrated at the aimlessness and game playing of the Intuitive’s attempts to generate the minimum amount to make their tests pass.

Reasoning programmers probably feel that in terms of communicating they have the superior style as they are able to advance arguments and logical constructs that can be interrogated. Those articulated models though can feel arid and irrelevant to Intuitive programmers, what does some obscure mathematical formula have to do with trying to make sure the frog doesn’t get run over when it crosses the road?

Intuitive programmers seem to be better at switching contexts and adapting to change as they can quickly see the “outlines” of any problem and their techniques are about honing that initial perception into a functional solution. By contrast a Reasoning program is wary of uncertainty and is unhappy drawing unjustified analogies between different situations.

In FP terms Reasoning programmers have their behaviour emerge as a logical consequence of the operation of lower level abstractions; their code looks like algebra and domain data is passed in to the top of their processing chain. Intuitive programmers on the other hand, fix behaviour in their tests and fill their code with domain language that aims to match the natural language of the organisation, the depth of their function calls is usually smaller and they are swifter to bind calculations to intermediate variables.

I think I am an Intuitive programmer (and therefore I worry that I am miscasting Reasoning programmers through my lack of understanding) and in my work most of my colleagues are as well, it is in the nature of consultancy to have to adapt to constantly changing domains, code bases and expectations. We do have a few Reasoning programmers though who do exactly the same work but do it via thinking deeply through problems and drawing logical inferences.

If there are issues between the two groups then I think it occurs when there are no concessions between the two; common flashpoints being testing, writing defensive/guard code and giving time to discuss problems and problem solving strategies. The two also agree on a lot of things for very different reasons: for example both like to rewrite code, refactoring is a formalised practice for Intuitive developers whereas Reasoning developers often want to apply new insight or learned ideas (“This could all just be monads!”).

An important point is that neither approach is “right” both types of programmer arrive at the same results if their experience, ability and other factors are equal. They are purely styles of working.

Programming, Python, Ruby

Truly open classes

Here’s an interesting observation, I needed to write a little script to automate some number calculating for me. I was wondering whether to do it in Ruby or Python. I’m doing a lot of Python at the moment so I felt I ought to give Ruby a little go. Share some of the love.

However the solution I had in mind really didn’t work with Ruby because while Ruby has open classes it has a comparatively fixed idea of attributes. In Python you can set attributes very freely on any object so I have got in the habit of creating something and then enhancing by applying a function. Example? Okay.

def make_captain(actor): actor.rank = "Captain" return actor

class Person: pass

captain = make_captain(Person())

So this little trick doesn’t work, or rather is much more difficult to do in Ruby as Ruby, at is dynamic heart, is a language that believes in object-orientation and that classes should encapsulate rather than being little collections of data. You can use instance_variable_get/set but it lacks the elegance of the Python syntax.

In Ruby it would be easier to define the attributes in the class using the existing metaprogramming constructs and then have a class method to generate the content (effectively encapsulating my script logic).

Now this isn’t a straight “Ruby sux, no Python sux more” post. Between Scala, Clojure and Python I have been doing a lot more in a functional style that depreciates objects as anything more than value carriers. The Ruby vision of a class would give me something with a stronger sense of purpose and encapsulation, something that is hard to benefit from in a script for a particular purpose.

What is going to be interesting this year is trying to identify when the value of a piece of code is in the structure of it’s data-definition (i.e. objects) versus its process (functions). Having had a think about it I should perhaps rewrite my script to use some OO modelling because it may answer similar requirements down the line. However from a strict Lean/Waste point of view I should have gone with the Python solution as Ruby was imposing a restriction on me while providing benefits that I was unlikely to realise.

Programming, Work

The beauty of small things

I am very interested in the idea of “constellation architecture” and microapps as new model for both web and enterprise architecture. It feels to me like it a genuinely new way of looking at things that can deliver real benefit.

It is also not a new way of doing things, it is really just an extension of the UNIX tools idea and taking ideas like service-orientated architecture and some of the patterns of domain-driven design and taking them to their logical extreme conclusion.

If I take ls and I pipe it through grep, you wouldn’t find that particularly exciting or noteworthy. However creating a web application or service that does just one thing and then creating applications by aggregating the output of those many small components does some novel and slightly adventurous to some.

SOA failed before it began and the DDD silos of vertical responsibility seem poorly understood in practice. Both have good aspects though. However both saw their unit of composition as being something much larger than a single function. An SOA architecture for payments for example tended to include a variety of payment functions rather than just offering one service, authorising a payment for example.

There is a current trend to look at a webpage as being composed of widgets, whether they be written as server-side components or as client operated components. I think this is wrong and we need to see a page as being composed of the output of many different webapps.

Logging in a web-application whose only responsibility is to authenticate users, the most popular pages are delivered by an application whose responsibility to determine which pages are popular.

This applications should be as small as we can make them and still function. Ideally they should be a few lines of domain code linking together libraries and frameworks. They should have acceptance/behaviour tests to guarantee their external functionality and that’s about it.

It seems to me that the only way we are going to get good large-scale functionality is by aggregating useful, small segment small functionality. Building large functional stacks takes a lot of time and doesn’t deliver value exponentially to the effort of its creation.

Programming, Scala

Mockito and Scala

Scala is a language that really cares about types and mocking is the art of passing one type off as another so it is not that surprising that the two don’t get along famously. It is also a bit off probably that we are using Java mocking frameworks with Scala code which means you sometimes need to know too much about how Scala creates its bytecode.

Here are two recent problems: the “ongoing stubbing” issue and optional parameters with defaults (which can generally be problematic anyway as they change the conventions of Scala function calling).

Ongoing stubbing is an error that appears when you want to mock untyped (i.e. Java 1.4) code. You can recognise it by the hateful “?” characters that appear in the error messages. Our example was wanting to mock the request parameters of Servlet 2.4. Now we all know that the request parameters (like everything else in a HTTP request) are Strings. But in Servlet 2.4 they are “?” or untyped. Servlet 2.5 is typed and the first thing I would say about an ongoing stubbing issue is to see if there is Java 1.5 compatible version of whatever it is you are trying to mock. If it is your own code, FFS, add generics now!

If it is a library that you have no control over (like Servlet) then I have some bad news, I don’t know of any way to get around this issue, Scala knows that the underlying code is unknown so even if you specify Strings in your mock code it won’t let it compile and if you don’t specify a type your code still won’t compile. In the end I created a Stub sub-class of HttpServletRequest that returned String types (which is exactly what happens at runtime, thank you very much Scala compiler).

Okay so optional parameters in mocked code? So I love named parameters and default values because I think they are probably 100% (no, perhaps 175%) better at communicating a valid operating state for the code than dependency injection. However when you want to mock and set an expectation on a Scala function that uses a default value you will often get an error message saying that the mock was not called or was not called with the correct parameters.

This is because when the Scala code is compiled you are effectively calling a chain of methods and your mock needs to set matchers for every possible argument not the ones that your Scala code is actually calling. The simplest solution is to use the any() matcher on any default argument you will not be explicitly setting. Remember that this means the verification must consist entirely of matchers, e.g. eq() and so on.

What to do when you want to verify that a default parameter was called with an explicit value? I think you do it based on the order of the parameters in the Scala declaration but I haven’t done it myself and I’m waiting for that requirement to become a necessary thing for me to know.

Java, Programming

Python as a post-Java language

I’m a UNIX-based developer and since 2000 I have been working mainly with Java and then JVM languages. When Java 7 slipped I made no real secret of the fact that Java was in a lot of trouble. The post-Oracle world though looks even worse with a lack of clarity of what in the core ecosystem is free, open source and liability free.

Clojure and to a less extent Scala are great steps forward so I don’t feel the burning need for a Java 7/8 whatever. However a moribund or tainted JVM is a major problem and so I’m now thinking about what the post-Java escape route looks like. On the web front it is pretty obvious, Python and Ruby are great languages with great frameworks for developing web-based application. For the server-side heavy lifting it is a lot less clear, people are talking about Google Go but that does feel quite low-level, I’m not sure I’m ready to go back to pointer wrangling even with memory-management. It feels like something you’d build a tool out of not an application. Mono feels like more of the same problems of wrestling with big companies with vested interests, if you are going to do that then why not try and sort out the OpenJDK?

As the title of the post suggests the language I am most inclined towards right now is Python. It is a really concise but clear language that on UNIX systems comes with an amazingly comprehensive set of libraries and which has a virtual environment and dependency management that is on a par with RVM and gem.

The single issue that comes up is performance, what I have been finding that for 80% of the work I am doing performance is okay and I’m producing a fraction of the code I would normally have to create. For that last 20% maybe I am going to have to look at something like Go or (god forbid) go back to C but I would much prefer to see a Clojure or Scala that could run on top of something like LLVM. I also have some hope of smarter people than me making progress on a JIT for Python that might take 20% down to a figure where performance would matter so much to me I wouldn’t mind sweating to make it happen.

Echo One

Sequentially arranged sentences composed of words (and punctuation)

Category Archives: Programming

Google Chrome wants to sit at the table

I need feedback not features

Django and JSON stores: a match in heaven

Thoughts on Neo4J REST server 1.3

Many cores, many threads?

Intuitive versus Reasoning Programmers

Truly open classes

The beauty of small things

Mockito and Scala

Python as a post-Java language