Randomised Testing: Finding the needles in the haystack

You have some haystacks; you're fairly sure you've found most of the needles inside of them. You've checked the areas you usually find needles using tried and tested techniques. But you can't be sure - you can never be sure.

Some haystacks are smaller or sparser than the others and they're not being used for anything too important. The checks you've made on these haystacks are probably enough to ascertain that the really dangerous needles have been removed, and if there are any remaining needles they should be easy enough to take care of.

But what about that veritable mountain of a haystack over there? You've made it as small and sparse as possible, but some jobs just call for huge, dense lumps of hay. No matter how many times you check it with your trusty toolset of proven techniques, you know that there's no possible way you could have found all of the needles in a haystack of that magnitude - truly, that colossus of dessicated grass is nigh on infinite.

At which point the metaphor starts to break down a bit (unfortunately for this example everyday reality doesn't allow for eternal haystacks), but hopefully it's helped to illustrate the problem with testing code which has massive amounts of possible, salient tests, i.e. intimidatingly large test spaces.

We have all encountered haystacks, erm, code which cannot be properly exercised by unit and integration testing alone: consider that concurrency issue that keeps popping up but never appears in the logs, or the password validator which that one irrepressible tester keeps invoking dark arts upon to find the most innocuous of invalid input bugs; no amount of standard testing will help you defend against these types of problems. Which is where randomised testing enters the fray. Think of it as the loyal and tireless farmhand who will happily spend all day and night looking through your haystacks one hay(?) at a time to find those dastardly, elusive needles.

Haystacks are great and all, but what is a randomised test?

It's not a new type of test, but a new way of testing. You can randomise unit tests, integration tests, whatever - as long as you're randomising some part of a test then it's a randomised test. You can randomise anything: from implementations to environment variables, separately or all at once. Any change in a given test element's state should potentially lead to a new branch of logic being executed (e.g. randomising foo when it's never used is not a good use of randomisation). Additionally, we want our tests to be deterministic which means we control the parameters of the randomisations and the tests are implemented in such a way that the conditions for running a test are reproducible.

As long as randomising an element leads us into a new part of the test space (i.e. a new execution branch) then it's worth pursuing. We should use any pre-existing knowledge we have of the code to dedicate our time to the more useful randomisations. After writing all these tests we need to run them as often and as many times as possible, because each run helps us cover a bit more of the test space. The faster our tests are the more we can run them, so we should aim to keep our tests as speedy as possible without sacrificing thoroughness.

You probably recognise a lot of those principles as most of them also apply to every other type of testing. Which isn't surprising because randomised tests actually aren't that different to the tests we're all used to writing - they're just randomised.

Randomised testing involves randomising!

The defining element of these tests is, of course, the randomisation of some element or elements which are signal to the code under test. "But which such elements would I randomise, crazy haystack man?" you awkwardly ask, to which I astutely reply:

The obvious one is input values, generally parameters passed into code (and potentially any parameters they themselves might hold) by reference or otherwise. How we randomise input values depends on their type. Some brief examples:

  • For numerical parameters we can randomise by picking a value from a range.
  • Strings are a bit more complex as they usually need to adhere to a set of rules, i.e., when testing a tokenizer, but custom randomisers can be set up for them without too much fuss.
  • Collections can have their orders or implementations randomised.
  • etc.

A good heuristic is that anything about a parameter which can change state is a candidate for randomisation.

We can also randomise the way the tests themselves are run. The most common technique is to randomise the order of the test methods to ensure there are no strange interactions - this can be especially useful when dealing with concurrent code. Another technique is to randomise how the code under test is instantiated (or whatever the "bring code to life" terminology is in your chosen language), for instance, in Java, we could randomly chose from a range of concrete implementations every time we test a particular interface.

Even the very fabric of the testing environment itself isn't safe from unit testing. Environment variables such as timezones, library versions, and data sources can be randomised. True heroes (read: extremely thorough testers) can go so far as to randomise the platform, OS, or distro that tests take place under if the code absolutely has to work no matter what planet it's being run on.

Of course as we're trying to cover as much of the test space as possible (or at least the parts of the test space which particularly interest us) it makes sense to randomise as many elements as possible, time/hardware/sanity permitting, in our eternal quest to eradicate all of the bugs. Most randomised testing frameworks will do a lot of this out of the box (order of tests is usually a given) and provide tools to setup the more fiddly bits for yourself which can significantly reduce the overhead of the more complicated randomisations.

The general takeaway from this section should be: to fully cover the test space for a given piece of code - if it's not nailed down, randomise it. If it is nailed down then pull the nails out and if it doesn't break then randomise it as well.

When to randomise your way to a better life

If it makes sense to randomise something and you have the time and tools then there is no reason not to. More random means more haystack coverage and eventually less live bugs. But back in the real world we have sprint commitments, bugs, clients and all sorts of nonsense which take up the time we could be using to write randomised tests. Given that we probably can't apply randomised testing to everything I've identified four areas which can particularly benefit from the application of randomised testing:

Security sensitive code

Take a moment to think of part of your codebase, which, if successfully attacked, could mean the end of the company you work for (if you identify as working for a company, if not then feel free to substitute that consequence with an equally dire scenario). That part of the codebase and any other parts which fall into the same bucket are damn fine candidates for randomised testing. They are the sort of areas where the smallest of vulnerabilities can potentially result in hundreds of people losing their livelihoods. As such they need to be exercised as thoroughly as possible. Randomised testing is the ideal candidate for helping to protect against these sorts of bugs: its main use in this scenario is to help us defend against as many catastrophic scenarios as possible no matter how unlikely they may be.

Critical code

This is sort of similar to the above in that it can kill companies overnight if it stops working. Critical code is what overly professional types would probably call "business critical", so called because the business is dead in the water if it stops working. A lot of software is sensibly encapsulated so that one thing falling over can't destroy businesses, but what about that monolithic legacy class that your billing depends on? If that stops working the proverbial midden could impact the windmill quite spectacularly. Again randomised testing is useful here because of its ability to thoroughly exercise the target code.

Code which is hard to thoroughly test

A fairly intuitive example of this is a password validation service: given a string containing several type of character most likely up to 100 symbols long the service must apply a number of rules to verify that the password is valid. Even if we only have 2 or 3 rules that's still a lot of possible inputs to test. Thankfully we can use randomised testing to check our code is bug free across lots of different input ranges. Most randomised testing framework even come with a variety of built in random value generators (e.g., ascii string generator, alphanumeric string generator, etc.) so testing this sort of code shouldn't be much harder than plugging the generated values into existing tests and letting them rip.

Concurrency

That thing which all the cool experienced devs do. It has its own host of exciting problems which often result in bugs that only occur once in many runs and more often than not leave no evidence of their passing other than the destruction they have wrought. As such they are almost impossible to reproduce. Yay concurrency! Luckily we can set up reproducible randomised tests which will not only catch the blighters after a number of runs (disclaimer: the number of runs could be enormous) but also tell us how to reproduce them.

This set of heuristics should provide a good starting point for when, where and why to implement randomised testing, but the observant will have noticed that I have managed to completely sidestep any explanation of how to go about randomising tests. Mainly because I have yet to find a way of linking haystacks into the explanation. In my next posts I'll detail how we have implemented randomised testing at Brandwatch using a combination of Carrot Search Labs' randomized-testing framework, Maven, JUnit4 categories and Jenkins.