Scale Summit 2015: Testing in production session

One of the most interesting sessions I went to at Scale Summit 2015 was one about testing in production. It was not that well attended compared to the other sessions so I don't know if there was implied agreement with the topic.

One of the questions was why it is important to test in production. For me the biggest thing is that you can only really get realistically distributed traffic from genuine traffic. Most load-testing or replay strategies fail for me at the first hurdle by only creating load from a few points of presence, usually in the big Amazon availability zones. You also have to be careful that traffic is routed outside of Amazon's internal data connections if you want to get realistic numbers. Dealing with load from a few different locations with large data pipelines between them is very different from distributed clients on the public network.

Replay strategies allow for "realistic" traffic patterns and behaviours but one of the more interesting ideas discussed was to generate fake load during off-peak periods. This is generated alongside the genuine user traffic. The fake load exercises key revenue generating pathways with some procedural randomisation. Injecting this additional fake load allows capacity planning and scaling strategies to be tested to a known excess capacity.

Doing testing in production means being responsible so we talked about how to identify fake test traffic (HTML headers with verification seemed sensible) so that you can do things like circuit-break that traffic and also segment it in reporting.

During the conversation I realised that the Guardian's practice of asking native app users to join the beta programme was also an example of testing in production. Most users who enter the scheme don't leave so you are creating a large segment of users who are validating releases and features ahead of the wider user base.

In the past we've also used the Facebook trick of duplicating user requests into multiple systems to make sure that systems that are being developed can deal with production load. If you don't like doing that client-side you can do it server-side by using a simple proxy that queues up work with a variety of systems but offloads everything that isn't the user's genuine request. Essentially you throw away the additional responses but the services will still do the work.

We also talked about the concept of having advanced healthchecks that report on the status of things like the availability of dependencies. I've used this technique before but interestingly I've made the machines go into failure mode if their mandatory dependencies aren't available where as other people were simply dashboarding the failures (and presumably alerting on them).

At the end of the session I was pretty convinced that testing in production is not only sensible but that actually there are a number of weaknesses in pre-production testing approaches. The key one being that you should assume that pre-production testing represents the best case scenario. You are testing your assumed scenario in a controlled environment.

There is also a big overlap between good monitoring and production testing. You have to have the first before you can reasonably do the second. The monitoring needs to be freely accessible to everyone as well. There's no good reason to hide monitoring away in an operations group and developers and non-technical team members need to be able to see and understand what is actually happening in production if they are to have the same conversation.


Writing code without tests

This post is aimed at people who have mastered test-driven development and ideally also behaviour-driven development and who are familiar with XCheck testing. If you don’t have good basic steps then trying to jump onto some of these techniques are likely to backfire on you as you will probably struggle to assess the risks correctly.

There is a reason TDD was invented, it represents the refinement of good testing practice and the philosophy of good software design. TDD is a relatively simple practice to describe that requires effort to implement. Writing code driven by tests is safer that straight-coding.

Writing untested code is a kind of mastery technique. It is high-risk and relies on the skills and the knowledge of the programmer. I don’t think it is ever responsible if the programmer is not going to be the person supporting the result in production. Without this condition then the programmer’s interests are not properly aligned with the consumers of their code.

So with all those caveats in place what if we want to create code faster because we don’t have to write tests?

Well we have to understand where bugs come from and we will have to write code that doesn’t allow those situations to arise.

There are two important principles to start with. If you can rely on tested library code, then you can rely on the underlying quality of the tested code and leverage it in your own application. Secondly the code you don’t write will not have bugs.

Therefore we should be aiming to write the smallest amount of code possible and we should never try to code what others have coded for us.

The next point is about where bugs occur. I think we’re now at a consensus point that most bugs occur in the way we change and maintain state. In both procedural and functional languages it is rare to get a mistake in the order of steps that something must happen in for example. These kind of problems tend to be misunderstandings of the domain (that get written into test suites as well so testing doesn’t help catch them) rather than genuinely unexpected consequences of the programmer’s code. Object-orientated code is really hard to reason about from this point of view as objects don’t have an implied order of execution.

This is why quick scripts of less than 200 lines tend to do stable sterling service for years whereas larger applications are more tortured in their existence.

Therefore whatever language we are coding in we need to adopt the functional principle of operating only on our parameters and returning values that can be consumed by the caller.

Size matters, a lot, if whole program can fit into a single file and you can pretty much hold the whole thing in your head then it will be easy to reason about what the program is doing and see flaws in the logic of the program. A single complex line of code is better than many lines and is much better than many lines split across many files.

One way to bring down the size of code files is to be ruthless about concerns. For example recently in my Python programming I have been assigning only one purpose to each module: this module renders reports, this one provides JSON endpoints.

Another technique is to not persist any state, this is actually surprisingly easy in web programming since each request is completely separate event and by default you can trade CPU time for isolation.

If you are doing batch or server-side programming then it is worth considering using something like parallel to create many separate bubbles of execution rather than trying to write code yourself to distribute work.

Another aspect of state that causes issues are making global modifications, whether it be to a database or a filesystem. Try and defer all global changes to the final moment of a program and do all the manipulation in-memory instead. If you never change the world then you can run a program over and over again refining what it does.

Assertions are more powerful than logging in writing test-less code, it is better to kill a thread of execution rather than let it do something you weren’t expecting. Logging is really about helping build your intuition about what a program does and how it works.

Assertions allow you to create strong pre and post-conditions on the operation of the program. Essentially they allow you to guarantee the “happy path” execution of your code and avoid having to test all the negative situations that might occur.

Despite this you always want to code for failure, use short-circuit logic to abort code flow early and therefore simplify the context of the code in the rest of the function.

Remember all the basic rules of cyclomatic complexity, don’t nest, don’t do conditionals, do try and express your looping as list comprehensions.

Don’t write generic code, ever. The more potential inputs a function has, the more you end up needing unit-tests to verify the interactions. If something is meant to work on strings don’t try to make it work on strings and integers. Your detection code ends up being a potential source of bugs that needs testing.

If you write dynamic interpreted languages then you are going to have do some manual testing, unless you can remember the names and orders of the functions exactly. Don’t forget to dive into the shell or REPL and play around with the code in isolation. If you can verify the behaviour of individual parts of your program without having to wire together multiple components then you have the right level of granularity for your code.

Re-use code that is already working. Code re-use is generally best achieved by cut and pasting files and then importing the functions you need. Don’t try and synchronise your code, updating some library code ultimately means that you are going to know whether the new library code works as you expect with your functionality and that means you’ll need a test suite.

Don’t refactor your code, rewrite it. Refactoring requires unit tests. Don’t be afraid of things like myfunction2 (although once you have the new functionality you need to delete the old unused stuff). Re-writing allows you to ditch all your assumptions about the code and attempt to express your new understanding of the problem and the requirements as simply as possible.

Don’t work with large numbers of people on the same code base. The more people trying to modify and change the code the more you need tests to try and clarify your different intents for the code base. Again, try divide and conquer on the problem, rather than six people working on the same code can you get three sets of two people collaborating on three smaller codebases.

Finally don’t be afraid to write a test. Writing the right unit-test to prove you can rely on a base piece of functionality means that you then don’t have to write tests for all the pieces of code that use that underlying function. I like to try and write code without tests to maximise the flexibility of the code base when I’m tackling problems with unclear solutions. It is not an ideological thing to have no tests whatsoever, it is rather that when tempted to write a test I think “Could I do this in a way that is trivial and doesn’t require a test?”. Simplicity is the cornerstone of test-free code.

Software, Work

Up-front quality

There has been a great exchange on the London Clojurians mailing list recently talking about the impact of a good REPL on development cycles. The conversation kicks into high-gear with this post from Malcolm Sparks although it is worth reading it from the start (membership might be required I can’t remember). In his post Malcolm talks about the cost of up-front quality. This, broadly speaking, is the cost of the testing required to put a feature live, it is essentially a way of looking at the cost that automated testing adds to the development process. As Malcolm says later: “I’m a strong proponent of testing, but only when testing has the effect of driving down the cost of change.”.

Once upon a time we had to fight to introduce unit-testing and automated integration builds and tests. Now it is a kind of given that this is a good thing, rather like a pendulum, the issue is going too far in the opposite direction. If you’ve ever had to scrap more than one feature because it failed to perform then the up-front quality cost is something you consider as closely as the cost of up-front design and production failure.

Now the London Clojurians list is at that perfect time in its lifespan where it is full of engaged and knowledgeable technologists so Steve Freeman drops into the thread and sensibly points out that Malcolm is also guilty of excess by valuing feature mutability to the point of wanting to be able to change a feature in-flight in production, something that is cool but is probably in excess of any actual requirements. Steve adds that there are other benefits to automated testing, particularly unit testing, beyond guaranteeing quality.

However Steve mentions the Forward approach, which I also subscribe to, of creating very small codebases. So then Paul Ingles gets involved and posts the best description I’ve read of how you can use solution structure, monitoring and restrained codebases to avoid dealing with a lot of the issues of software complexity. It’s hard to boil the argument down because the post deserves reading in full. I would try and summarise it as the external contact points of a service are what matters and if you fulfil the contract of the service you can write a replacement in any technology or stack and put the replacement alongside the original service.

One the powerful aspects of this approach is that is generalises the “throw one away” rule and allows you to say that the current codebase can be discarded whenever your knowledge of the domain or your available tools change sufficiently to make it possible to write an improved version of the service.

Steve then points out some of the other rules that make this work, being able to track and ideally change consumers as well. Its an argument for always using keys on API services, even internal ones, to help see what is calling your service. Something that is moving towards being a standard at the Guardian.

So to summarise, a little thread of pure gold and the kind of thing that can only happen when the right people have the time to talk and share experiences. And when it comes to testing, ask whether your tests are making it cheaper to change the software when the real functionality is discovered in production.

Programming, Work

Optimizely testing like a hacker

At work we use Optimizely and I am a fan of the product; I think it has had a massive impact on the way we work and should really help guide us to decide what we choose to do.

However I am not a product manager, user testing expert or statistician (that last part is a lie, I’m a statistician who hasn’t done any stats for seventeen years) I am a dirty hacker programmer and I use Optimizely in a way that probably makes my colleagues weep but which I think actually makes it more valuable as a product. I want to talk about breaking some of the common rules that people put up around this testing.

Note that you need to understand what you’re doing here, I am not recommending this if you are new to the product or multi-variate testing. You also need a good stream of traffic to work on. I do, this is working out for me. One piece of good practice I would keep is: decide how you are going to judge the test before you start it and don’t change your measure once you’ve started. If it is clear your initial metrics aren’t helpful, design a new test. The knowledge you’ve gained is valuable for formulating the right measures.

Don’t change the test once you’ve started it

Only once the test has started can you understand what the problem you are dealing with is and what responses you can take to the issues. If you have a question about what is happening in the test feel free to create a new variation (always with a good name!) and throw it into the mix. I sometimes start with one variation and end the test with nine. It’s better to test immediately than speculate.

Changing a variation (no matter how tempting) is dangerous though as you’ll have to remember the differences and when you applied them. I prefer to spawn variations to changing an in-flight variation. Of course fixing bugs and unintentional consequences is fine. You’re looking at the long term rate not the initial performance.

Don’t change the traffic

I’m not sure this is a general shibboleth but I play around with traffic massively during the test. The great thing about Optimizely is that it takes care of the math so feel free to mix the allocation of traffic freely. If you have a run-away winner early on then don’t be afraid to feed the majority of traffic to it.

Make the test work for the whole audience

I don’t believe in this, make the test work for the easiest audience segment to access. I frequently only test on modern browsers. If you find a trend then shock, horror it often works for the whole audience. It’s about fast feedback not universal truth.

The biggest advantage is that you can use CORS-compliant browsers to do bigger changes to the pages under test.

Don’t change the underlying content

If you take your best performing variation and apply it to the page then the “Original” variation should trend to the variation. If it doesn’t then you know something is up with your measuring. I actually think it is really helpful to make a succession of changes to the base content, based on the tests until the Original variation is performing better than the individual variations.

Once Original is top performing variation you can stop testing the page.

A/A testing has problems

So what? Optimizely has a few issues, you need to deal in big numbers. A/A can be helpful but if you are working in five digit numbers or double-digit percentages then don’t worry about the noise.

Tests have to look good

If your theory is accurate it absolutely does not have to look good. If you are worried that your hypothesis is not working because of the visuals: get over yourself and admit that the idea was weak and you need to rethink it.

I like to start off all variations looking a bit crappy and then seeing whether they can be outperformed by an improved appearance. Often the answer is no; there is a rule of diminishing returns on the appearance of a variation. Things get over-designed on the web all the time. However by trying better looking variations in increments you know exactly how much effort to invest.

Web Applications

Good magic, bad magic

Philip Potter pinged me his post on Sinatra magic during the week. Mark Needham’s comment and code on solving the mocking problem is good advice to the problem as posed.

At Wazoku where we use the often equally magical Bottle framework we don’t use top-down TDD but instead outside-in functional tests (with no funky runners as we don’t need CI). This solves the whole magic issue by shifting the attention to what the public interactions of the application are. This is one of the massive benefits of using a microapp HTTP/JSON/REST-like architecture. I could flip the API from Bottle to Django or Compojure or Sinatra and my test suite can keep on rocking and telling me whether the behaviour my consumers are relying on is correct.

The major thing I felt when reading through Philip’s post was the massive amount of effort that was going into testing relatively simple behaviour. This is a bit of anti-pattern with Agile developers (or perhaps it is part of the mastery thing where rote “correct” behaviour is modified by experience and judgement). One of the massive advantages of using something like Sinatra is that you can get a whole web app with rich behaviour into less than 200 lines. If you then create thousands of lines of test code and battle with the magic for hours on end you’ve completely destroyed your productivity.

If you have a code base that you expect to be large and highly contested by a large development team you need good, layered testing and to use frameworks that support that. If you have an app that is small and when its done it is done then there is no need to agonise as to whether it was done “right”.

The idea that top-down TDD is the only correct way to write software is corrosive. When faced with a generally poorly skilled and educated workforce it is good to have rules. I have imposed a certain style of TDD on a group myself because it gives a good framework for work and achieves very consistent output.

However with skilled people on small scale projects you can kill yourself by imposing arbitrary rules. I love Sinatra and while I might be equivocal about magic I think it is ridiculous to moan about it if you are using something as unicorn-packed as Ruby. For example Philip was trying to use RSpec mocks and stubs to do his TDD. The result is kind of saying that you’re disappointed that your “good” magic for testing didn’t work with the “bad” magic of a DSL for web applications. Even if your RSpec code passed its tests you still haven’t said anything particularly deep about the production behaviour of your application as your unit testing environment was severely compromised by the manipulations of your mocking framework.

So my rule of thumb is: if its simple, do it; if it was simple, functionally test it; if it was never really simple then test-drive it with suitable tools.

Programming, Python

How does the patch decorator in Mock work?

I tend to use Mock more as a stubbing library rather than for mocking. The patch decorator is pretty handy in terms of this as it takes care of all the resetting once your stubbed test has run making it easy to have a test where a dependency returns an empty list, followed by a single-entry list and so on.

However I often forget how exactly it works so I’ve decided to write up my latest remembering of how to do this (via John Hartley’s help and reminders) so I have something to look up next time I forget.

The first thing is that the patch decorator takes a string that represents the fully qualified name of the stub/mock you want to create. In a Django app for example that means you should include the app name at the root. The name also reflects the local name of an imported item. Something I commonly do wrong is to bind to the absolute name, say ‘random.choice’ rather than ‘myapp.mymodule.random.choice’. If you are in the situation where your stub is correct when you call it directly but never happens when you run the code under test I am pretty sure that naming will be at the root of your problems 95% of the time.

For each string argument you have in patch you also need to define a parameter to the test function, this will contain the actual Mock object and is what you use to actually stub the value to what you want it to be for the test. Use names that make sense here, stub_db, fake_file_reader not just mock1, mock2 and so on.

With these relatively few reminders in place you should now be in a position to stub simply with Mock!

Software, Work

The Joy of Acceptance Testing: Is my bug fixed yet?

Here’s a question that should be a blast from the past: “Is my bug fixed yet?”.

I don’t know, is your acceptance test for the bug passing yet?

Acceptance tests are often sold as being the way that stakeholders know that their signed-off feature is not going to regress during iterative development. That’s true, they probably do that. What they also do though is dramatically improve communication between developers and testers. Instead of having to faf around with bug tracking, commit comment buggery and build artifact change lists you can have your test runner of choice tell you the current status of all the bugs as well as the features.

Running acceptance tests is one example where keeping the build green is not mandatory. This creates a need for a slightly more complicated build result visualisation. I like to see a simple bar split into green and red with the number of failing tests. There may be a good day or two when that bar is totally green but in general your test team should be way ahead of the developers so the red bar represents the technical deficit you are running at any given moment.

If it starts to grow you have a prompt to decide whether you need to change your priorities or developer quality bar. Asking whether a bug has been fixed or when the fix will be delivered are the wrong questions. For me the right questions are: should we fix it and how important is it?

If we are going to fix a bug we should have an acceptance test for it and its importance indicates what time frame that test should pass in.

Groovy, Java, Programming, Scripting, Software

Working with Groovy Tests

For my new project Xapper I decided to try and write my tests in Groovy. Previously I had used Groovy scripts to generate data for Java tests but I was curious as to whether it would be easier to write the entire test in Groovy instead of Java.

Overall the experience was a qualified “yes”. When I was initially working with the tests it was possible to invoke them within Eclipse via the GUnit Runner. Trying again with the more recent 1.5.7 plugin, the runner now seems to be the JUnit4 one and it says that it sees no tests, to paraphrase a famous admiral. Without being able to use the runner I ended up running the entire suite via Gant, which was less than ideal, because there is a certain amount of spin-up time compared to using something like RSpec’s spec runner.

I would really like all the major IDEs to get smarter about mixing different languages in the same project. At the moment I think Eclipse is the closest to getting this to work. NetBeans and Intellij have good stories around this but it seems to me to be a real pain to get it working in practice. I want to be able to use Groovy and Java in the same project without having Groovy classes be one of the “final products”. NetBeans use of pre-canned Ant files to build projects is a real pain here.

Despite the pain of running them though I think writing the tests in Groovy is a fantastic idea. It really brought it home to me, how much ceremony there is in conventional Java unit test writing. I felt like my tests improved when I could forget about the type of a result and just assert things about the result.

Since I tend to do TDD it was great to have the test run without having to satisfy the compiler’s demand that methods and classes be there. Instead I was able to work in a Ruby style of backfilling code to satisfy the runtime errors. Now some may regard this a ridiculous technique but it really does allow you to provide a minimum of code to meet the requirement and it does give you the sense that you are making progress as one error after another is squashed.

So why use Groovy rather than JRuby and RSpec (the world’s most enjoyable specification framework)? Well the answer lies in the fact that Groovy is really meant to work with Java. Ruby is a great language and JRuby is a great implementation but Groovy does a better job of dealing with things like annotations and making the most of your existing test libraries.

You also don’t have the same issue of context-switching between different languages. Both Groovy and Scala are similar enough to Java that you can work with them and Java without losing your flow. In Ruby, even simple things like puts versus println can throw you off. Groovy was created to do exactly this kind of job.

If the IDE integration can be sorted out then I don’t see any reason why we should write tests in Java anymore.

Java, Programming

Three Little Mockers

At my last client I got the unusual chance to try three Java mocking frameworks within the same project. The project had started to use EasyMock as the project team felt that Mockito didn’t really have any decent documentation (it does but it’s in the Mockito codebase, not on the Google Code wiki). However as a ThoughtWorks team was working next door there was a strong push to use that.

My personal preference is still for JMock so I also selfishly pushed that into the project to round out the selection. With all three there; we were ready for a Mock Off!

The first distinctive thing is that EasyMock and Mockito can mock concrete classes not just interfaces. The second thing that all three have very different methods of constructing and verifying mocks. EasyMock is record-replay, JMock constructs expectations that are automatically verified by a JUnit runner when the test stops executing, Mockito uses a TestSpy where you can verify what happened to the mock whenever you want.

The record-replay paradigm lead to EasyMock being discarded early on. It has two kinds of problems as far as I was concerned. Firstly you have the feeling of “inside out” where the test is copying the internals of the class under test. Secondly I was confused several times as to what state my mock was in and having to switch the mocks to replay mode felt noisy, particularly where you were setting multiple collaborators in the test (do you switch them to replay once their recording is done or do you record all mocks then switch all of the them to replay?).

Mockito’s easy mocking of concrete classes made it a valuable addition to the toolkit and it does deliver the promised noise free setup and verificiation. However its use of global state was frustrating, particularly in that you can create a compiling use of the API that fails at runtime. If you call verify without a following method then you get an error about illegal use of the framework. Although this is meant to happen in the test that follows the test with the illegal construction, in practice this is  hideous pain when the test suite is a non-trivial size. It also meant that a test was appearing to pass when actually nothing was being verified and the error appeared in another pair’s test (the one that implemented the next test) making them think that something was wrong with their code.

Mockito also uses a lot of static imports (which I do have a weaknesses for) but it also means that you have to remember the entry points into the framework.

JMock by comparision to Mockito feels quite verbose, the price for having all that IDE support and a discoverable API is that you have to have a Mockery or two in all your classes and you are defining all your Expectations for the Mockery. There’s no denying that even with the IDE autocomplete you’re doing a lot more typing with JMock. From a lazy programmer point of view you are going to go with Mockito every time.

And that is pretty much what happened in the project. Which I think is a shame. Now in explaining this I am going to go into a bit of ThoughtWorks testing geekery so if you are lazy feel free to go off and use Mockito because the pain I’m talking about will happen in the future not now.

I feel that Mockito is a framework for Test Driven Development and JMock is a framework for Test Driven Design. A lot of times you want the former: tight-deadline work and brownfield work. You want to verify the work you are doing but often design is getting pushed to the sidelines or you actually lack the ability to change the design as you might want. Mockito is great for that.

However Mockito doesn’t let the code speak to you. It takes care of so much of the detail that you often lose feedback on your code. One thing that making you type out your Expectations in JMock does is make you think, really think, about the way your code is collaborating. I think you are much more likely to design your code with JMock because it offers lots of opportunities to refactor. For example if two services are constantly being mocked together then maybe they contain services that are so closely related they should be unified, logically or physically. You don’t see that opportunity for change with Mockito.

By using both JMock and Mockito I was able to see that Mockito did tend to make us a bit lazy and our classes tended to grow fatter in terms of functionality and collaborators because there was no penalty for doing so. I was also concerned that sometimes Mockito’s ability to mock concrete classes meant that sometimes we mocked objects that should have been real or stub implementations.

Mockito is a powerful framework and I suspect that Java developers will take to it as soon as they encounter it. However for genuine Test Driven Design I think Mockito suffocates the code and requires developers with a lot more discipline and design-nous, almost certainly not the people that wanted an easy to use framework in the first place.

Java, Programming, Python, Scripting, Web Applications, Work

JMeter or The Grinder, so which one is better, like?

Or for the benefit of Google: Apache JMeter versus The Grinder. Fight!

A while ago I got paid to put these two tools head to head and I think it’s been long enough that the people who put up the money will have got the benefit now.

I had used The Grinder in a previous job where it had done excellent service finding out what level of peak load our site could handle. It was convenient in that you script it in Jython because Jython was our chosen scripting language.

JMeter on the other hand was a whole new thing to me. It is a kind of graphical programming interface where you drop controls into a tree structure and the framework generates threads that run through the tree generating interactions with the target application via things like HttpClient.

The scripting versus component structure is a pretty major difference between the two and is probably one of the key things that is going to help you choose between two products that are both, to be honest, pretty mature, capable pieces of software.

If you don’t feel comfortable with Python or Jython then Grinder is likely to be out. JMeter is much more friendly to non-programming testers. However this decision isn’t as clear as you might think, JMeter actually includes Javascript functions, access to BeanShell and its own expression language. As we created more complex traffic simulations we found ourselves having extremely complex expressions that often mixed a scripting language with an expression evaluation. In some ways The Grinder is more honest in telling you up front that you are going to have to program some of this stuff yourself.

Another big differentiator is in reporting, if you need to generate reports then JMeter is a much better choice as it has components that collect and collate information and you can dump data to XML for XSLT transformation into pretty much any output format you want. It is also possible to run JMeter headless once you have the test scripts so it is possible to encorporate it into a continuous integration process. Debugging slightly flaky applications is much easier with JMeter because you can easily capture a lot of result information and then just delete the data gathering component once the test flow is working.

A lot of the functionality of both products overlaps: they can both be used a browser proxy and be used to capture scripts as a user “drives” around the site. These captures are likely to form the basis of your test plans unless you have a very clear URL scheme with little session state.

Both use thread pools to generate load and both tend to get blocked on your application before they max out their local machine (although both are capable of using 90% of memory and CPU). JMeter has a few nice options around randomly ramping up the thread activation (recreating a spike in usage more closely). The Grinder tends to give more bang for the buck as its runtime is very simple and minimal.

Both are also capable of using distributed agents and collating the data from these agents back to the coordinator.

In many ways there isn’t a “bad” choice to be made between them. So what might sway your choice one way or another? Well JMeter is an Apache project and that might make a difference for some people as you have all those good things like project governance and a good chance the project is going to continue to be developed and enhanced. JMeter was also the only project to have a test suite last time I checked out both code bases.

The Grinder does have one unbeatable quality in my opinion and that’s flexibility. When you look at the features listed on the websites you might think that JMeter does all this cool stuff with HTTP, SOAP and POP3 and the like and that The Grinder is comparatively light in the feature set.

The opposite is actually true, as The Grinder website points out the The Grinder can test anything with a Java API. There really is no limit to what you can make it do or how you can execute your test plan. In fact most of the things I’ve complained about with The Grinder are fixable from within the test scripts  (what I am really complaining about is that some things like capturing HTML output per run is something that should be available as part of the standard package). On the other hand I felt that creating a new JMeter component was actually quite tricky as you have to deal with both the GUI aspects and the actual functionality you are trying to create in your test component.

If you really want to have total control of your concurrent volume testing then The Grinder is probably the best product for you.

Perhaps the last topic for consideration is The Grinder’s use of Jython. Python is a great language and you get a lot of power for some very concise code. Just as some people are going to find it off-putting others are actually going to find it very compelling.

Okay so yet another weak “horses for courses” post on the InterWeb. Perhaps the easiest summary to end on is that your test teams will probably take a shine to JMeter while your developer teams who are also building scale tests are probably going to like The Grinder; unless they dislike Python or learning a new language.