Programming

Refactoring abuse and strong type compiler systems

“Refactoring” is one of the most abused terms in programming. It has a formal meaning but when generally used it tends to mean rewriting or restructuring code (or as I like to refer to it: changing stuff). One interesting new use of refactoring I heard recently was to describe extracting common code. Creating some new codebase is perhaps the opposite of refactoring.

So refactoring tends to mean developers are just changing things they have already written. Real refactoring is of course done to code under test so I was interested in a Stuart Halloway quote about compilation being the weakest form of unit-testing. Scala is used a lot at the Guardian and it has a more powerful type system and compiler than Java which means if you play along with the type system you actually get a lot of that weak unit-testing. In fact structuring your code to maximise the compiler guarantees and adding the various assertion methods to make sure that you fail fast at runtime are two of things that help increase your productivity with Scala.

If you’ve seen the Coursera Scala videos you can see Martin Odersky doing some of this “weak refactoring” in his example code where he simplifies chained collection operations by moving or creating simple functionality in his types.

Of course just like regular refactoring there have to be a few rules to this. Firstly weak refactoring absolutely requires you use explicit function type declarations. Essentially in a weak refactor what you are doing is changing the body of a function while retaining its parameters and return type. If you can still compile after you’ve changed code you are probably good.

However the other critical thing is how much covariance the return type has. A return type of Option for example is probably a bad candidate for weak refactoring as it is probably critical whether your changed code still returns Some or None for a given set of a parameters. Only conventional refactoring can determine whether that is true.

Standard
Programming

Our tools are doing us a disservice

Do you like using Intellij or a similar IDE that allows you to navigate your code base easily and restructure freely? Do you like the fact that your code has a huge test suite that allows you to make changes with confidence?

These things seem like good things. Why would anyone have a problem with them?

Recently though at conferences and in discussions at work it is starting to seem to me and other that powerful tools have a dark and dangerous side to them. The more powerful the tools you have at your disposal the longer and longer you can work on a codebase without facing up to the issues that you have.

A powerful IDE allows you to have insanely complex projects with hundreds, possibly thousands, of files in them. I’m not sure that the Java love of abstraction across multiple classes would have happened if you have had to navigate the resulting package structure with Vi.

Rather than working to simplify your code base you can continue to add in each special case and niche requirement, everyone can have a home with a Strategy pattern here and a class hierarchy there. Our test suite grows and grows to make sure that each overlapping requirement can be added safely and without consideration of its worth. We are perhaps proud that 60 to 80% of our codebase is test code that, in itself, is adding no value to our business.

Our rich dependency managers encourage us to add in libraries or even worse extract and share code across multiple projects. Until of course we start to burn in a transitive dependency hell or our own making.

We all love powerful tools, we all love powerful languages that are feature rich but the more powerful our tools our the more they should help us find the simplicity in what we do and ensure that we deliver measurable value quicker rather than providing just a longer noose.

Standard
Java

Hibernate temporary tables

I was surprised to discover that Hibernate makes use of temporary tables when performing deletes. The creation of the table was triggering an alert in a database monitor for an application. The monitor logged the creation of tables beginning with the prefix HT_ and a name that mirrored that of an entity tablename.

Initially I must admit that I was thrown very far off track and started looking through the schema creation scripts and codebase looking for something, even checking HT as initials against the developer names. Naturally we tried deleting the tables and soon enough they were recreated.

As with a lot of modern programming bugs the answer came through some frustrated Googling and the realisation that HT might stand for Hibernate Table. With that insight in place I was then able to find this excellent explanation of the situation and I get another self-righteous data point in my dislike of ORM.

This temporary table is required for two reasons, firstly because ORM’s do not actually do a proper mapping between the domain concepts of a relational store and objects and secondly because Hibernate wants to create a¬†verisimilitude¬†around the concept of a class hierarchy.

I don’t think there is a better solution than what Hibernate is offering here but I do think it is the wrong problem being solved in a clever way.

Standard
Scala

What’s the problem with Scala’s “lazy” vals?

Scala has a value modifier called “lazy” and I have a bit of a problem with it. Let’s start with the name, lazy is used to indicate that the value is not defined when declared but when the value is read. Really it isn’t lazy at all it is actually a memoisation wrapper around the value.

Now that’s a pretty handy thing and there are definitely some use cases for this. In particular where the calculation of the value is truly invariant and expensive. Some people suggest that it can be used to cache the results of service or database lookups but that is clearly madness unless the query really does return an invariant.

However I think that there is a very subtle problem with the misapplication of lazy and that is that it introduces state into an otherwise pure function, namely whether the value has been calculated or not and what the value is when calculated. This state to me actually reduces the utility of a pure function in terms or reuse as the consumer has to be very aware in time of when the function is going to be invoked and what its value is going to be over time.

It’s rare that lazy over-application goes wrong because I mostly see it in short-lived instances. However that very lifespan seems to suggest that there is no real value in lazy evaluation over eager declaration. After all if the function never gets called then it doesn’t cost anything and true constants should be in the companion object. In long-lived objects though I think there is serious potential for lazy to go wrong and therefore it should be applied sparingly where a positive impact can be proved.

Standard
Programming

Keep the focus on the read

One of the interesting things about Wazoku‘s Startup Challenge app is that a lot of the functionality is created via “out of the box” CouchDB features. In fact it is often where we haven’t lent heavily on the features of our store and frameworks where we have issues.

One of the interesting things we decided to do in the app relatively late in the day was provide little encouragements to say how many more votes an entry needed to get to the next place in the ladder. As this was a late feature we didn’t really think through where this feature would sit. We had code that re-ranks entries when their vote ordering changes and so when an entry was being re-ranked it also acquired the target to beat at the same time.

With a store like CouchDB you are really aiming to keep on reading data and minimising writes. That’s via denormalisation and also about strategies to generate related and derived data when you are changing the parent data.

So this placement made sense from that point of view. It was only later that I have begun to realise that we were choosing the wrong point to read. With hindsight it is actually only necessary to calculate the target entry when someone looks at the entry. This is because the views of the entries are distributed unequally and the vote totals already exist as a CouchDB view and therefore we can do a key lookup to find all entries with more votes than the current entry when needed.

If we wanted to cache that result to avoid needless recalculation we would be better off storing the information in front-side cache like Memcached or Redis but in practice key reads in CouchDB are pretty damn fast and low load.

So we thought we were saving ourselves problems by denormalising derived data but in fact we were creating a lot more work at a point where it is uncertain that the additional data will ever be consumed.

Sometimes it can be hard to pick the right point to read!

 

Standard
Programming

Intuitive versus Reasoning Programmers

During the last year I’ve been helping run a monthly series of dojos for the London Clojure User Group. In the course of it I have had the chance to watch a lot of people grapple with functional programming. As a result of this and also looking at the way a lot of my colleagues at ThoughtWorks work I think programmers can roughly be divided into two groups: Intuitive and Reasoning.

To characterise both of them a little, I think that Intuitive programmers tend to use domain language a lot, rely heavily on tests and TDD, often find it difficult to articulate what they are doing in their work, they like small source code files because they like to see everything a file does at a glance, they prefer outside-in problem solving and like the opportunity to go back and revise their work.

Reasoning programmers like to discuss a problem before coding and explore edge-cases and what-ifs, they like to work in the REPL or have in-editor code evaluation, they quickly move from a domain to an abstraction and then work in that abstraction, they don’t mind a thousand line source file as long as it is logically structured (that’s what the search function is there for), they prefer “bottom-up” coding where they distil their abstraction to its essence and having implemented that create the required behaviour by composing their abstractions.

Both sets of programmers can be great at their work but often if they are unaware of the characteristics of how they like to work then there can be massive amounts of tension as the two wrestle back and forth. The Intuitive developers feel angsty when the Reasoning programmers start hacking out code rather than refactoring, Reasoning developers get frustrated at the aimlessness and game playing of the Intuitive’s attempts to generate the minimum amount to make their tests pass.

Reasoning programmers probably feel that in terms of communicating they have the superior style as they are able to advance arguments and logical constructs that can be interrogated. Those articulated models though can feel arid and irrelevant to Intuitive programmers, what does some obscure mathematical formula have to do with trying to make sure the frog doesn’t get run over when it crosses the road?

Intuitive programmers seem to be better at switching contexts and adapting to change as they can quickly see the “outlines” of any problem and their techniques are about honing that initial perception into a functional solution. By contrast a Reasoning program is wary of uncertainty and is unhappy drawing unjustified analogies between different situations.

In FP terms Reasoning programmers have their behaviour emerge as a logical consequence of the operation of lower level abstractions; their code looks like algebra and domain data is passed in to the top of their processing chain. Intuitive programmers on the other hand, fix behaviour in their tests and fill their code with domain language that aims to match the natural language of the organisation, the depth of their function calls is usually smaller and they are swifter to bind calculations to intermediate variables.

I think I am an Intuitive programmer (and therefore I worry that I am miscasting Reasoning programmers through my lack of understanding) and in my work most of my colleagues are as well, it is in the nature of consultancy to have to adapt to constantly changing domains, code bases and expectations. We do have a few Reasoning programmers though who do exactly the same work but do it via thinking deeply through problems and drawing logical inferences.

If there are issues between the two groups then I think it occurs when there are no concessions between the two; common flashpoints being testing, writing defensive/guard code and giving time to discuss problems and problem solving strategies. The two also agree on a lot of things for very different reasons: for example both like to rewrite code, refactoring is a formalised practice for Intuitive developers whereas Reasoning developers often want to apply new insight or learned ideas (“This could all just be monads!”).

An important point is that neither approach is “right” both types of programmer arrive at the same results if their experience, ability and other factors are equal. They are purely styles of working.

Standard
Programming, Work

The beauty of small things

I am very interested in the idea of “constellation architecture” and microapps as new model for both web and enterprise architecture. It feels to me like it a genuinely new way of looking at things that can deliver real benefit.

It is also not a new way of doing things, it is really just an extension of the UNIX tools idea and taking ideas like service-orientated architecture and some of the patterns of domain-driven design and taking them to their logical extreme conclusion.

If I take ls and I pipe it through grep, you wouldn’t find that particularly exciting or noteworthy. However creating a web application or service that does just one thing and then creating applications by aggregating the output of those many small components does some novel and slightly adventurous to some.

SOA failed before it began and the DDD silos of vertical responsibility seem poorly understood in practice. Both have good aspects though. However both saw their unit of composition as being something much larger than a single function. An SOA architecture for payments for example tended to include a variety of payment functions rather than just offering one service, authorising a payment for example.

There is a current trend to look at a webpage as being composed of widgets, whether they be written as server-side components or as client operated components. I think this is wrong and we need to see a page as being composed of the output of many different webapps.

Logging in a web-application whose only responsibility is to authenticate users, the most popular pages are delivered by an application whose responsibility to determine which pages are popular.

This applications should be as small as we can make them and still function. Ideally they should be a few lines of domain code linking together libraries and frameworks. They should have acceptance/behaviour tests to guarantee their external functionality and that’s about it.

It seems to me that the only way we are going to get good large-scale functionality is by aggregating useful, small segment small functionality. Building large functional stacks takes a lot of time and doesn’t deliver value exponentially to the effort of its creation.

Standard
Programming, Scala

Mockito and Scala

Scala is a language that really cares about types and mocking is the art of passing one type off as another so it is not that surprising that the two don’t get along famously. It is also a bit off probably that we are using Java mocking frameworks with Scala code which means you sometimes need to know too much about how Scala creates its bytecode.

Here are two recent problems: the “ongoing stubbing” issue and optional parameters with defaults (which can generally be problematic anyway as they change the conventions of Scala function calling).

Ongoing stubbing is an error that appears when you want to mock untyped (i.e. Java 1.4) code. You can recognise it by the hateful “?” characters that appear in the error messages. Our example was wanting to mock the request parameters of Servlet 2.4. Now we all know that the request parameters (like everything else in a HTTP request) are Strings. But in Servlet 2.4 they are “?” or untyped. Servlet 2.5 is typed and the first thing I would say about an ongoing stubbing issue is to see if there is Java 1.5 compatible version of whatever it is you are trying to mock. If it is your own code, FFS, add generics now!

If it is a library that you have no control over (like Servlet) then I have some bad news, I don’t know of any way to get around this issue, Scala knows that the underlying code is unknown so even if you specify Strings in your mock code it won’t let it compile and if you don’t specify a type your code still won’t compile. In the end I created a Stub sub-class of HttpServletRequest that returned String types (which is exactly what happens at runtime, thank you very much Scala compiler).

Okay so optional parameters in mocked code? So I love named parameters and default values because I think they are probably 100% (no, perhaps 175%) better at communicating a valid operating state for the code than dependency injection. However when you want to mock and set an expectation on a Scala function that uses a default value you will often get an error message saying that the mock was not called or was not called with the correct parameters.

This is because when the Scala code is compiled you are effectively calling a chain of methods and your mock needs to set matchers for every possible argument not the ones that your Scala code is actually calling. The simplest solution is to use the any() matcher on any default argument you will not be explicitly setting. Remember that this means the verification must consist entirely of matchers, e.g. eq() and so on.

What to do when you want to verify that a default parameter was called with an explicit value? I think you do it based on the order of the parameters in the Scala declaration but I haven’t done it myself and I’m waiting for that requirement to become a necessary thing for me to know.

Standard
Java, Programming

Python as a post-Java language

I’m a UNIX-based developer and since 2000 I have been working mainly with Java and then JVM languages. When Java 7 slipped I made no real secret of the fact that Java was in a lot of trouble. The post-Oracle world though looks even worse with a lack of clarity of what in the core ecosystem is free, open source and liability free.

Clojure and to a less extent Scala are great steps forward so I don’t feel the burning need for a Java 7/8 whatever. However a moribund or tainted JVM is a major problem and so I’m now thinking about what the post-Java escape route looks like. On the web front it is pretty obvious, Python and Ruby are great languages with great frameworks for developing web-based application. For the server-side heavy lifting it is a lot less clear, people are talking about Google Go but that does feel quite low-level, I’m not sure I’m ready to go back to pointer wrangling even with memory-management. It feels like something you’d build a tool out of not an application. Mono feels like more of the same problems of wrestling with big companies with vested interests, if you are going to do that then why not try and sort out the OpenJDK?

As the title of the post suggests the language I am most inclined towards right now is Python. It is a really concise but clear language that on UNIX systems comes with an amazingly comprehensive set of libraries and which has a virtual environment and dependency management that is on a par with RVM and gem.

The single issue that comes up is performance, what I have been finding that for 80% of the work I am doing performance is okay and I’m producing a fraction of the code I would normally have to create. For that last 20% maybe I am going to have to look at something like Go or (god forbid) go back to C but I would much prefer to see a Clojure or Scala that could run on top of something like LLVM. I also have some hope of smarter people than me making progress on a JIT for Python that might take 20% down to a figure where performance would matter so much to me I wouldn’t mind sweating to make it happen.

Standard
Programming

The Helper Anti-Pattern

You have a class X, you have a class called XHelper. XHelper contains methods that make it easy to use X.

The problem I have with this antipattern is that XHelper does nothing of value. If the methods are truly related to X then they should actual be class methods of X. However if you need “helper” methods to use the API of X chances are what is really required is a refactoring of X to incorporate the enhancements of XHelper invisibly. You shouldn’t need a helper to use an API.

Take Rails page helpers. A helper to construct the content of a page contains functionality that would be better marshalled in the controller, prior to view rendering. If multiple controllers perform the action then extract it to a service that controllers can invoke on the requests and delegates they are co-ordinating.

What if the Helper class actually refactors common functionality from classes X and Y and is actually called FooHelper because it helps perform Foo in X and Y?

Well, here we are onto something, we have some common functionality which is good and the name of the class reflects its purpose. The same question arises though, could FooHelper’s methods actually reside in Foo? If Foo is purely a function or method call then perhaps all the functionality relating to Foo should be encapsulated in a Foo class that presents the foo method.

Alternatively perhaps there is a better name than “Helper”? As examples, I tend to call collections of class methods that transform instances of one class into instances of another class “Transformers”. Similarly methods that create database connection instances could be called “Providers”. If you cannot make the class a private class instance of the class or classes the Helper is nominally a Helper to, then there is usually a better name for the class lurking around somewhere.

Standard