A Theory of Program Testing - A Grand Challenge for Computer
Science?
The elimination of errors in computer systems is a major activity in
any development project. One of the most popular methods of finding
error is testing. Testing absorbs a significant proportion of the
budget of any project involving programmable digital systems. Yet,
thus far, most of the work on testing within Software has been
empirical and pragmatically motivated. By contrast the situation in
hardware test is more fully developed. However, as the complexity of
systems increases, along with our reliance on such systems, the
results of conventional testing programmes provide less and less
justification for that reliance. Industry standards in avionics and
other high-integrity systems demand that systems are virtually
error-free but empirical data demonstrates that most systems with
programmable elements are riddled with errors.
Other engineering disciplines depend on theoretical models to
"amplify" the significance of test results. Complex mechanical
systems require relatively few tests to accurately predict reliability
and absence of some classes of error. The development of a theory of
program testing will help us understand whether
such "amplification" is possible and could result in the means to design
systems that are testable using reasonable resources.
One could argue that the absence of errors in software should be
established by other means, for example by program proof. This
proposal is not an argument against such techniques but rather an
argument for a synthesis of all approaches to program verification.
We will always need testing because:
- Program proofs take idealised programs as their subject. Testing
is empirical and can be used to check the translation from our
idealised understanding to a running system is reasonably error free.
- It seems to be hard to scale program proofs to very large
systems. Perhaps we will always be forced to use testing for
sufficiently large systems.
- We can make errors in proofs.
- Most high-integrity systems require diversity of evidence that a
system is fit-for-purpose. Some combination of proof and testing may
provide that diversity.
- Pragmatically, it seems hard to get practising programmers to
adopt program proof. The success of approaches like model-checking
seem to stem from their "black-boxing" of proof and the use of
model-checking as a form of "symbolic debugging".
There is a small and growing body of work on program testing in the
literature. But it is quite fragmentary and lacks coherence. We argue
that by taking on testing as a "grand challenge" we could: help
synthesise the approaches of diverse areas of work, raise
intellectually challenging questions, and perhaps make a significant
contribution to the solution of a practical problem.
Potential Areas of Work
There is already some work in these areas but it is poorly integrated and
a synthesis of approaches would benefit the work.
- test case generation from formal specifications.
- test case generation for wider classes of system
(e.g. distributed systems).
- self-testing systems.
- self-correcting systems.
- language design for testable systems.
- the relationship between proof and testing.
- measures of the test complexity of systems.
- test oracle generation.
- the control of retesting after repairing errors
- the significance of tests.
- the development of tools to support these approaches.
- case studies and industrial application
Existing and Potential Approaches to Testing
These are a few approaches to the testing problem. Taking each
individually they could lead to good insights into testing but some
synergy of approaches could provide the basis for a more comprehensive
theory.
- Complexity: Work by Blum and others on self-testing, and
by Lipton
on random sampling provide some directions but are primarily directed
to numerical problems. The connection between learning and testing
seems close; for example, there has been recent work on adapting
VC-dimension to give a measure of testing complexity for programs.
- Type Systems/Abstract Interpretation: Work from
compiling and type systems directed towards optimisation of code
provides good information to direct the selection of test cases.
Features like polymorphism and linearity in type systems help to more
tightly characterise the program under inspection.
- Program Specification: There is already a body of work
on the generation of test cases from axiomatic and model based program
specifications but this work is still in need of significant development.
- Semantics: There has been very little work on the
semantics of testing, one might consider looking at developing
semantic structures that facilitate testing. Another approach might
come from proof theory where program proofs provide significant
information on the structure of the program. Using proofs as the
basis for testing complexity measures looks like an interesting avenue
of enquiry.
- Concurrency: testing has featured in the theory of
concurrency in the area of testing-based equivalences. It could be
that these have some relevance for the testing of distributed
systems. Practitioners have many problems in testing distributed
systems.
LFCS
Last modified: Tue Nov 25 16:53:45 GMT 1997