Mutation Testing: Quis Custodiet Ipsos Custodes
When test-driven development (TDD) was becoming a “thing,” is was easy to see how the test cases validated the functionality of the production code, but the obvious question was how the validity of the test cases could be ensured. Would TDD lead to a hall of mirrors, with endless suites of test cases validating the suites before them?
A common answer in those days was to relate TDD (or unit testing in general) to double-entry bookkeeping: Every transaction is recorded in two accounts, one as a debit and the other as a credit in the same amount. If everything balances at the end of the day, we’re good. Similarly, the test cases guard the production code, and the production code guards the test cases.
That was a lot better than the things we used to do before we learned about TDD, but there was still a problem: People aren’t perfect at creating test suites that cover every contingency. Even if people were perfect, the complexity of some software systems makes it impractical or economically unwise even to try and specify every contingency in a test case.
What else can we do? One option is property-based testing (PBT). I’ve looked into PBT before and shared my experiences here. Your conclusions may differ, but I found PBT tools to be pretty complicated to use, to the extent that the effort and time involved wasn’t worth the benefits obtained, except under some circumstances. Besides that, PBT doesn’t solve the problem of incomplete test suites.
Another option is mutation testing. With mutation testing, the application code is modified in specific ways to introduce errors, and the existing test suite (example-based, property-based, or whatever) is exercised against the modified application. Modified blocks of application code are called mutants. If the existing test suite catches the error, the mutant is said to be killed. If the test suite fails to notice the error, the mutant is said to survive.
Mutation testing verifies that your test suite is really telling you what you think (or hope) it’s telling you about the application code. Based on the results, you can decide to add more test cases or ignore some of the surviving mutants, depending on what makes sense in context.
Mutation testing tools are available for most mainstream languages. Most of them are not very mature just yet. One fairly mature example is PIT, a mutation testing tool for Java. Let’s give it a try.
I happen to have a Java application that I created specifically for the purpose of playing around with testing tools, called Java Poker. With all due modesty, I must say I did a creditable job of writing imperfect code. Let’s pretend the imperfections were intentional, for the sake of my fragile ego.
Intentional or not, the code base wouldn’t be of much use for learning about testing tools if everything came out 100% right all the time. PIT should discover quite a few opportunities for improvement in the tests.
Following the Quickstart instructions on the PIT site, I added a dependency to the Maven pom and ran the goal:
mvn org.pitest:pitest-maven:mutationCoverage
This was straight out of the box with all defaults, no customization, no configuration, no preparation of the target code base, and no modification to the existing test suite.
PIT applied various types of mutations to the code and executed the existing test suite. It generated an HTML report of the results:
The report shows line coverage, which isn’t related to the mutations, and mutation coverage, which tells us how well the existing test suite detected the defects that were injected via mutation. We can see in this case there were a number of mutations that survived, indicating potential improvement in the test suite.
We can drill down through multiple levels of reports to see the details. In this example from class FiveCardStudGame, PIT found “no coverage” in several places. The results look like this:
Here’s an example from class AbstractCard that shows how PIT reports a mutant that survives:
The tool provides enough information about how it mutated the code and what happened that you can add or modify test cases to cover the situation.
Mutation testing tools for other languages provide similar information. There’s no way to guarantee code is perfect, but mutation testing appears to be a useful technique to add to your toolkit to gain higher confidence in the code. I found this particular mutation testing tool to be very easy to use, and to provide good value for the effort.