Filina Consulting, Inc

Randomizing Test Data is a Bad Idea

March 8th, 2019

Any form of randomization in automated tests is a bad idea, yet everyone does it. It can lead to tests that fail unpredictably, while giving a false sense of confidence every time that they don't fail. Randomizing is basically hoping that with enough builds, we may find a bug that existed all along.

Here is an example of a data randomizer:

public function testMyMethod()
{
    $ipAddress = $this->faker->ipv4; // randomizer
    $subject = new Subject();
    $subject->myMethod($ipAddress);
}

What is a test anyway?

One of the qualities of an automated test is reliability. It should reliably point out a regression. What if you introduced a regression in your system, but the current randomized dataset failed to expose it? Your test became unreliable and therefore lost most of its value. Sadly, you won't see it, since the test didn't fail and you're not aware of the regression.

Relying on the randomizer means pushing code that may or may not be correct. "It's possible that mortgage payments will be charged incorrectly for millions of people, but at least we might discover this on the 10,000th build".

Problems in the wild

I’ve seen this go sideways in a really bad way on a project. We had literally every single build fail with half a dozen errors related to randomizers, and unrelated to the code being changed. The team couldn't stay on top of them, so they ended up accepting broken pull requests after eyeballing the output for “known” failures. Not only was it tedious, but from time to time, a real bug found its way into production, even though the test suite caught it.

On another project, many bugs got caught using the randomizer method, but not before causing errors in production for an extended period. This is because at the time of pushing it into production, the test happened to generate a passing dataset. That bug should (and could) have been caught earlier.

Finding the unknown

So here's the situation. You're afraid that a hardcoded dataset will only expose known bugs. You want to find the unknown bugs. This is why most people randomize datasets. But we jumped to a solution without thinking properly. What are we really trying to do here?

We're trying to find the unknown bugs. How will the system respond if we threw the entire text of Moby Dick as the username? That sounds reasonable. However, why do we do it inside our automated tests? Preventing regressions and looking for bugs in our system are different things.

The process should instead be:

  1. Throw random datasets at the system until something fails.
  2. Write a test with the dataset that caused a failure.
  3. Fix the system so that this test passes.
  4. Congratulation! You just reliably found and fixed a bug that may have taken 2 years worth of builds to find.

How many iterations should you execute when fuzzing? Set it to as many iterations as your computer can run in the amount of time that you're willing to wait. For example, if you're willing to let it run 5 minutes while you're having a coffee break, then set it to however many millions (billions?) of iterations that represents.

You only need to do it once to generate the list of failures. After you initially ran it, you shouldn't have to run it again until you touch the underlying code.

Do I need to write two tests?

No. Most testing frameworks allow you to tag tests, so that you can exclude those fuzz tests from your build. You'll just need to write wrapper methods with separate tags. Here's a simple example in PHP:

const REPEAT_MY_METHOD = 1000000;

/**
 * @group regression
 */
public function testMyMethod()
{
    $this->doTestMyMethod('127.0.0.1');
}

/**
 * @group fuzzing
 */
public function testMyMethodFuzzing()
{
    $this->faker = new Generator();
    for ($i = 0; $i < self::REPEAT_MY_METHOD; $i++) {
        $ipAddress = $this->faker->ipv4; // randomizer
        $this->doTestMyMethod($ipAddress);
    }
}

private function doTestMyMethod($ipAddress)
{
    $subject = new Subject();
    $subject->myMethod($ipAddress);
}

In the above example, the actual test is executed inside doTestMyMethod. Any data that you may want to randomize, such as $ipAddress, will be provided as an argument.

You then write two methods for two purposes. One tagged as regression that will be part of your build suite. It uses hardcoded values. The second you tag as fuzzing and only run when needed. It repeats the test any number of times, say 1,000,000. It randomizes the data.

Here's more info: https://en.wikipedia.org/wiki/Fuzzing

Happy bug hunting!

Edits since publication:

  • 2019-03-08: Added reference to fuzzing on Wikipedia.
  • 2019-03-08: Added an example at the top. Added the "Problems in the wild" section.

Comments


This page is protected by reCAPTCHA and the Google
Privacy Policy and Terms of Service apply.

Phone: +1 514-918-7866 | E-mail: me@afilina.com