A Theory of Program Testing - A Grand Challenge for Computer Science?

The elimination of errors in computer systems is a major activity in any development project. One of the most popular methods of finding error is testing. Testing absorbs a significant proportion of the budget of any project involving programmable digital systems. Yet, thus far, most of the work on testing within Software has been empirical and pragmatically motivated. By contrast the situation in hardware test is more fully developed. However, as the complexity of systems increases, along with our reliance on such systems, the results of conventional testing programmes provide less and less justification for that reliance. Industry standards in avionics and other high-integrity systems demand that systems are virtually error-free but empirical data demonstrates that most systems with programmable elements are riddled with errors.

Other engineering disciplines depend on theoretical models to "amplify" the significance of test results. Complex mechanical systems require relatively few tests to accurately predict reliability and absence of some classes of error. The development of a theory of program testing will help us understand whether such "amplification" is possible and could result in the means to design systems that are testable using reasonable resources.

One could argue that the absence of errors in software should be established by other means, for example by program proof. This proposal is not an argument against such techniques but rather an argument for a synthesis of all approaches to program verification. We will always need testing because:

Program proofs take idealised programs as their subject. Testing is empirical and can be used to check the translation from our idealised understanding to a running system is reasonably error free.
It seems to be hard to scale program proofs to very large systems. Perhaps we will always be forced to use testing for sufficiently large systems.
We can make errors in proofs.
Most high-integrity systems require diversity of evidence that a system is fit-for-purpose. Some combination of proof and testing may provide that diversity.
Pragmatically, it seems hard to get practising programmers to adopt program proof. The success of approaches like model-checking seem to stem from their "black-boxing" of proof and the use of model-checking as a form of "symbolic debugging".

There is a small and growing body of work on program testing in the literature. But it is quite fragmentary and lacks coherence. We argue that by taking on testing as a "grand challenge" we could: help synthesise the approaches of diverse areas of work, raise intellectually challenging questions, and perhaps make a significant contribution to the solution of a practical problem.

Potential Areas of Work

There is already some work in these areas but it is poorly integrated and a synthesis of approaches would benefit the work.

test case generation from formal specifications.
test case generation for wider classes of system (e.g. distributed systems).
self-testing systems.
self-correcting systems.
language design for testable systems.
the relationship between proof and testing.
measures of the test complexity of systems.
test oracle generation.
the control of retesting after repairing errors
the significance of tests.
the development of tools to support these approaches.
case studies and industrial application

Existing and Potential Approaches to Testing

These are a few approaches to the testing problem. Taking each individually they could lead to good insights into testing but some synergy of approaches could provide the basis for a more comprehensive theory.

Complexity: Work by Blum and others on self-testing, and by Lipton on random sampling provide some directions but are primarily directed to numerical problems. The connection between learning and testing seems close; for example, there has been recent work on adapting VC-dimension to give a measure of testing complexity for programs.
Type Systems/Abstract Interpretation: Work from compiling and type systems directed towards optimisation of code provides good information to direct the selection of test cases. Features like polymorphism and linearity in type systems help to more tightly characterise the program under inspection.
Program Specification: There is already a body of work on the generation of test cases from axiomatic and model based program specifications but this work is still in need of significant development.
Semantics: There has been very little work on the semantics of testing, one might consider looking at developing semantic structures that facilitate testing. Another approach might come from proof theory where program proofs provide significant information on the structure of the program. Using proofs as the basis for testing complexity measures looks like an interesting avenue of enquiry.
Concurrency: testing has featured in the theory of concurrency in the area of testing-based equivalences. It could be that these have some relevance for the testing of distributed systems. Practitioners have many problems in testing distributed systems.

LFCS

Last modified: Tue Nov 25 16:53:45 GMT 1997