When to Stop Testing
Simon Knight shared his ideas about how to decide when to stop testing software in a recent article. It struck me that some of the things he mentions represent testing and some represent checking, or validating known (or expected) behaviors of the system under test. In addition, some of his heuristics for deciding when to stop testing indicate underlying assumptions that may be questionable. For lack of a better word, let’s test that observation.
Testing or checking?
Conventionally, people use the word “testing” (with respect to software) to describe two distinct types of activity. The distinction has been clarified by the noted software testing luminaries James Bach and Michael Bolton. Here’s how Bach describes testing and checking:
Testing is the process of evaluating a product by learning about it through exploration and experimentation. Which includes:
- Questioning
- Study
- Modeling
- Observation
- Inference
- And more…
A test is an instance of testing.
Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.
A check is an instance of checking.
Even those of us who are aware of the distinction are guilty of using the word testing loosely. The word is deeply embedded in the jargon of the IT field, product names, and so on. Whatever you call the two activities, the important thing is to keep in mind the practical differences.
In his article, Knight describes a web page to be tested. It contains various radio buttons, checkboxes, and other widgets. He asks how many tests (he means checks, really) would be needed to validate the various combinations of settings. Knight suggests creating a truth table, which is fine. There are ways to reduce the number of checks without compromising quality, such as the method Bolton describes in an article on pairwise testing. At LeadingAgile, we have a software testing exercise for job candidates that calls for a similar approach, too. So far, so good, as long as we remember the difference between testing and checking.
But Knight goes on to say he asks questions to clarify the design intent of the software. This activity is clearly not checking. It’s the creative value-add work of testing. It can include questioning assumptions, validating design ideas, discovering unanticipated behaviors of the code, exploring edge cases, surfacing overlooked needs, and much more.
It’s possible I react too strongly to words, but it seems to me Knight conflates the two types of activity under the term, “testing.” I think it’s important because the two activities have very different purposes and are carried out in very different ways.
Contemporary software development and delivery practices call for the automation of checking, leaving testers with more time to perform value-add testing work that can’t be automated. If we don’t distinguish the two types of activity, we’ll have a hard time identifying things that can be automated.
Questionable assumptions
Under “stopping heuristics,” Knight mentions a few things that suggest an old-school mindset about software testing.
One of them is “I’m not finding bugs anymore.” This reflects the old school mentality that the purpose of testing is to find bugs. It seemed odd to me, as he had already mentioned the fact the “specifications” shouldn’t be the beginning and end of testing. If “bug” means “not to specification,” then this stopping heuristic seems inconsistent with his earlier statements.
Another one is “The software isn’t ready for testing yet.” This reflects the old school mentality that testing occurs after coding. Software is “ready” for testing from the moment it enters a person’s mind as a possible product. We can validate assumptions, test market demand, determine financial feasibility, and so on. From that point forward, “testing” in some sense of the word drives everything we do to realize the product, from Specification By Example to Test-Driven Development.
Stopping heuristics as process improvement indicators
Two of the stopping heuristics Knight mentions can be seen as red flags pointing to opportunities to improve the delivery process: “I’m out of time,” and “My manager said I should stop now.”
Organizations that depend on manual checking methods at large testing scope (e.g., end-to-end) tend to run out of time for regression testing in every release cycle. This is usually an indication that people should automate more checking. Nearly all regression testing falls into the category of checking, and can be automated.
The manager declaring that testing must stop is typically a variation on the same theme. The reason the manager wants testing to stop is usually that time is running out, and staff have to turn their attention to other work.