Testing presents an interesting anomaly for the software engineer. During earlier software engineering activities, the engineer attempts to...
Testing presents an interesting anomaly for the software engineer. During earlier software engineering activities, the engineer attempts to build software from an abstract concept to a tangible product. Now comes testing. The engineer creates a series of test cases that are intended to "demolish" the software that has been built. In fact, testing is the one step in the software process that could be viewed (psychologically, at least) as destructive rather than constructive.
Software engineers are by their nature constructive people. Testing requires that the developer discard preconceived notions of the "correctness" of software just developed and overcome a conflict of interest that occurs when errors are uncovered. Beizer describes this situation effectively when he states:
There's a myth that if we were really good at programming, there would be no bugs to catch. If only we could really concentrate, if only everyone used structured programming, topdown design, decision tables, if programs were written in SQUISH, if we had the right silver bullets, then there would be no bugs. So goes the myth. There are bugs, the myth says, because we are bad at what we do; and if we are bad at it, we should feel guilty about it. Therefore, testing and test case design is an admission of failure, which instills a goodly dose of guilt. And the tedium of testing is just punishment for our errors. Punishment for what? For being human? Guilt for what? For failing to achieve inhuman perfection? For not distinguishing between what another programmer thinks and what he says? For failing to be telepathic? For not solving human communications problems that have been kicked around . . . for forty centuries? Should testing instill guilt? Is testing really destructive? The answer to these questions is "No!" However, the objectives of testing are somewhat different than we might expect.
Testing Objectives
In an excellent book on software testing, Glen Myers states a number of rules that can serve well as testing objectives:
1. Testing is a process of executing a program with the intent of finding an error.
2. A good test case is one that has a high probability of finding an as-yetundiscovered error.
3. A successful test is one that uncovers an as-yet-undiscovered error.
These objectives imply a dramatic change in viewpoint. They move counter to the commonly held view that a successful test is one in which no errors are found. Our objective is to design tests that systematically uncover different classes of errors and to do so with a minimum amount of time and effort.
If testing is conducted successfully (according to the objectives stated previously), it will uncover errors in the software. As a secondary benefit, testing demonstrates that software functions appear to be working according to specification, that behavioral and performance requirements appear to have been met. In addition, data collected as testing is conducted provide a good indication of software reliability and some indication of software quality as a whole. But testing cannot show the absence of errors and defects, it can show only that software errors and defects are present. It is important to keep this (rather gloomy) statement in mind as testing is being conducted.
Testing Principles
Before applying methods to design effective test cases, a software engineer must understand the basic principles that guide software testing. Davis suggests a set1 of testing principles that have been adapted for use :
• All tests should be traceable to customer requirements. As we have seen, the objective of software testing is to uncover errors. It follows that the most severe defects (from the customer’s point of view) are those that cause the program to fail to meet its requirements.
• Tests should be planned long before testing begins. Test planning can begin as soon as the requirements model is complete. Detailed definition of test cases can begin as soon as the design model has been solidified. Therefore, all tests can be planned and designed before any code has been generated.
• The Pareto principle applies to software testing. Stated simply, the Pareto principle implies that 80 percent of all errors uncovered during testing will likely be traceable to 20 percent of all program components. The problem, of course, is to isolate these suspect components and to thoroughly test them.
• Testing should begin “in the small” and progress toward testing “in the large.” The first tests planned and executed generally focus on individual components. As testing progresses, focus shifts in an attempt to find errors in integrated clusters of components and ultimately in the entire system .
• Exhaustive testing is not possible. The number of path permutations for even a moderately sized program is exceptionally large . For this reason, it is impossible to execute every combination of paths during testing. It is possible, however, to adequately cover program logic and to ensure that all conditions in the component-level design have been exercised.
• To be most effective, testing should be conducted by an independent third party. By most effective, we mean testing that has the highest probability of finding errors (the primary objective of testing).
Testability
In ideal circumstances, a software engineer designs a computer program, a system, or a product with “testability” in mind. This enables the individuals charged with testing to design effective test cases more easily. But what is testability? James Bach describes testability in the following manner.
Software testability is simply how easily [a computer program] can be tested. Since testing is so profoundly difficult, it pays to know what can be done to streamline it. Sometimes programmers are willing to do things that will help the testing process and a checklist of possible design points, features, etc., can be useful in negotiating with them.
There are certainly metrics that could be used to measure testability in most of its aspects. Sometimes, testability is used to mean how adequately a particular set of tests will cover the product. It's also used by the military to mean how easily a tool can be checked and repaired in the field. Those two meanings are not the same as software testability. The checklist that follows provides a set of characteristics that lead to testable software.
Operability. "The better it works, the more efficiently it can be tested."
• The system has few bugs (bugs add analysis and reporting overhead to the test process).
• No bugs block the execution of tests.
• The product evolves in functional stages (allows simultaneous development and testing).
Observability. "What you see is what you test."
• Distinct output is generated for each input.
• System states and variables are visible or queriable during execution.
• Past system states and variables are visible or queriable (e.g., transaction logs).
• All factors affecting the output are visible.
• Incorrect output is easily identified.
• Internal errors are automatically detected through self-testing mechanisms.
• Internal errors are automatically reported.
• Source code is accessible.
Controllability. "The better we can control the software, the more the testing can be automated and optimized."
• All possible outputs can be generated through some combination of input.
• All code is executable through some combination of input.
• Software and hardware states and variables can be controlled directly by the test engineer.
• Input and output formats are consistent and structured.
• Tests can be conveniently specified, automated, and reproduced.
Decomposability. "By controlling the scope of testing, we can more quickly isolate problems and perform smarter retesting."
• The software system is built from independent modules.
• Software modules can be tested independently.
Simplicity. "The less there is to test, the more quickly we can test it."
• Functional simplicity (e.g., the feature set is the minimum necessary to meet requirements).
• Structural simplicity (e.g., architecture is modularized to limit the propagation of faults).
• Code simplicity (e.g., a coding standard is adopted for ease of inspection and maintenance).
Stability. "The fewer the changes, the fewer the disruptions to testing."
• Changes to the software are infrequent.
• Changes to the software are controlled.
• Changes to the software do not invalidate existing tests.
• The software recovers well from failures.
• Code simplicity (e.g., a coding standard is adopted for ease of inspection and maintenance).
Stability. "The fewer the changes, the fewer the disruptions to testing."
• Changes to the software are infrequent.
• Changes to the software are controlled.
• Changes to the software do not invalidate existing tests.
• The software recovers well from failures.
Understandability. "The more information we have, the smarter we will test."
• The design is well understood.
• Dependencies between internal, external, and shared components are well understood.
• Changes to the design are communicated.
• Technical documentation is instantly accessible.
• Technical documentation is well organized.
• Technical documentation is specific and detailed.
• Technical documentation is accurate.
The attributes suggested by Bach can be used by a software engineer to develop a software configuration (i.e., programs, data, and documents) that is amenable to testing.
And what about the tests themselves? Kaner, Falk, and Nguyen suggest the following attributes of a “good” test:
1. A good test has a high probability of finding an error. To achieve this goal, the tester must understand the software and attempt to develop a mental picture of how the software might fail. Ideally, the classes of failure are probed. For example, one class of potential failure in a GUI (graphical user interface) is a failure to recognize proper mouse position. A set of tests would be designed to exercise the mouse in an attempt to demonstrate an error in mouse position recognition.
2. A good test is not redundant. Testing time and resources are limited. There is no point in conducting a test that has the same purpose as another test. Every test should have a different purpose (even if it is subtly different). For example, a module of the SafeHome software is designed to recognize a user password to activate and deactivate the system. In an effort to uncover an error in password input, the tester designs a series of tests that input a sequence of passwords. Valid and invalid passwords (four numeral sequences) are input as separate tests. However, each valid/invalid password should probe a different mode of failure. For example, the invalid password 1234 should not be accepted by a system programmed to recognize 8080 as the valid password. If it is accepted, an error is present. Another test input, say 1235, would have the same purpose as 1234 and is therefore redundant. However, the invalid input 8081 or 8180 has a subtle difference, attempting to demonstrate that an error exists for passwords “close to” but not identical with the valid password.
3. A good test should be “best of breed”. In a group of tests that have a similar intent, time and resource limitations may mitigate toward the execution of only a subset of these tests. In such cases, the test that has the highest likelihood of uncovering a whole class of errors should be used.
4. A good test should be neither too simple nor too complex. Although it is sometimes possible to combine a series of tests into one test case, the possible side effects associated with this approach may mask errors. In general, each test should be executed separately.