Software Testing and Quality Assurance
SEng5441
Chapter 4: Structural (White Box) Testing
(Data flow testing)
KIoT
Department of Software Engineering
1
Basic Idea
• A program unit such as a function, accepts input values, performs computations while assigning new
values to local and global variables, and, finally, produces output values.
• Therefore, one can imagine a kind of “flow” of data values between variables along a path of program
execution
• A data value computed in a certain step of program execution is expected to be used in a later step.
Example: Obtain a file pointer ……. use it later.
– If the later use is never verified, we do not know if the earlier assignment is acceptable.
• Two motivations of data flow testing
1. a memory location corresponding to a program variable is accessed in a desirable way. For
example, a memory location may not be read before writing into the location.
2. it is desirable to verify the correctness of a data value generated for a variable—this is
performed by observing that all the uses of the value produce the desired results.
• Idea: A programmer can perform a number of tests on data values.
– These tests are collectively known as data flow testing.
2
General Idea
• Data flow testing can be performed at two conceptual levels.
– Static data flow testing
– Dynamic data flow testing
• Static data flow testing
– Identify potential defects, commonly known as data flow anomaly.
– Analyze source code.
– Do not execute code.
• Dynamic data flow testing
– Involves actual program execution.
– Bears similarity with control flow testing.
• Identify paths to execute them.
• Paths are identified based on data flow testing criteria. 3
Data Flow Anomaly
• Anomaly: It is an abnormal way of doing something.
– For example, it is an abnormal situation to successively assign two values to
a variable without using the first value.
– Example 1: The second definition of x overrides the first.
x = f1(y);
x = f2(z);
• Three types of abnormal situations with using variable.
– Type 1: Defined and then defined again
– Type 2: Undefined but referenced
– Type 3: Defined but not referenced 4
Data Flow Anomaly
• Type 1: Defined and then defined again (Example 1 previous slide)
– Four interpretations of Example 1
• The first statement is redundant.
• The first statement has a fault -- the intended one might be: w = f1(y).
• The second statement has a fault – the intended one might be: v = f2(z).
• There is a missing statement in between the two: v = f3(x) may be the desired
statement that should go in between the two given statements.
– Note: It is for the programmer to make the desired interpretation.
– However, it can be said that there is a data flow anomaly in those two statements. 5
Data Flow Anomaly…..
• Type 2: Undefined but referenced
– A second form of data flow anomaly is to use an undefined variable in a computation.
– Example: int x, y;
x = x – y – w; /* where the variable w has not been initialized/defined by the programmer. */
– Two interpretations
• The programmer made a mistake in using w.
• The programmer wants to use the compiler assigned value of w.
• Type 3: Defined but not referenced
– A third kind of data flow anomaly is to define a variable and then to undefine it without using it in
any subsequent computation.
– Example: Consider x = f(x, y). in which a new value is assigned to the variable x.
– If the value of x is not used in any subsequent computation, then we should be suspicious of the
computation represented by x = f (x, y). Hence, this form of anomaly is called “defined but not
referenced.
6
Data Flow Anomaly…
• The concept of a state-transition diagram is used to model a program variable to identify
data flow anomaly.
• Components of the state-transition diagrams
– The states
• U: Undefined state: meaning that just a memory location has been allocated to the variable but no
value has yet been assigned.
• D: Defined but not referenced
• R: Defined and referenced: The variable remains in the R state as long as the programmer keeps
referencing the value of the variable. If the programmer assigns a new value to the variable, the
variable moves back to the D state.
• A: Abnormal, a programming anomaly has occurred.
– The actions
• d: define the variable, a process of assigning a value to the variable
• r: reference (or, read) read the value of the variable
• u: undefine the variable, the programmer can take an action to the variable
7
Data Flow Anomaly
Figure 1: State transition diagram of a program variable [10] (©[1979] IEEE).
8
Data Flow Anomaly…
• Obvious question: What is the relationship between the Type 1, Type 2, and
Type 3 anomalies and Figure 1?
• The three types of anomalies (Type 1, Type 2, and Type 3) are found in the
diagram in the form of action sequences:
– Type 1: dd
– Type 2: ur
– Type 3: du
• Detection of data flow anomaly via program instrumentation
– Program instrumentation: Insert new code to monitor the states of variables.
– If the state sequence contains dd, ur, or du sequence, a data flow anomaly is said to occur.
• What to do after detecting a data flow anomaly?
– Investigate the cause of the anomaly.
– To fix an anomaly, write new code or modify the existing code.
9
Overview of Dynamic Data Flow Testing
• A programmer manipulates/uses variables in several ways.
– Initialization, assignment, using in a computation, using in a condition
• Motivation for data flow testing?
– One should not feel confident that a variable has been assigned the correct value, if no test case
causes the execution of a path from the point of assignment to a point where the value is used.
– Note
• Assignment of correct value means whether or not a value has been correctly generated.
• Use of a variable means
– If new values of the same variable or other variables are generated.
– If the variable is used in a conditional statement to alter the flow of control.
• The above motivation indicates that certain kinds of paths are executed in data flow
10
testing.
Overview of Dynamic Data Flow Testing…
• Data flow testing is outlined as follows:
– Draw a data flow graph from a program.
– Select one or more data flow testing criteria.
– Identify paths in the data flow graph satisfying the selection criteria.
– Derive path predicate expressions from the selected paths .
– Solve the path predicate expressions to derive test inputs.
11
Data Flow Graph
• A data flow graph is drawn with the objective of identifying data definitions and their uses
• Each occurrence of a data variable(The life cycle of data in programming code) is classified as follows:
– Definition: it includes defining, creation and initialization of data variables and the allocation of the memory to its
data object.
– Example :
Figure 2: Definition and uses of variables
i = x; /* The variable i gets a new value/ is an example of definition of the variable i */
12
Data Flow Graph….
– Undefinition or kill: This occurs when the value and the location become unbound.
– Referring to the C function VarTypes(), the first statement
(iptr = malloc(sizeof(int));
initializes the integer pointer variable iptr and
iptr = i + x;
initializes the value of the location pointed to by iptr. The second statement
iptr = malloc(sizeof(int));
redefines variable iptr, thereby undefining the location previously pointed to by iptr.
– Usage: This occurs when the value is fetched from the memory location of the variable. There are two forms of
uses of a variable.
• Computation use (c-use): This directly affects the computation being performed. In a c-use, a potentially
new value of another variable or of the same variable is produced.
– Example: x = 2*y; /* y has been used to compute a value of x. */
• Predicate use (p-use): This refers to the use of a variable in a predicate controlling the flow of execution.
13
– Example: if (y > 100) { …} /* y has been used in a condition. */
Data Flow Graph
• A data flow graph is a directed graph constructed as follows.
– A sequence of definitions and c-uses is associated with each node of the graph.
– A set of p-uses is associated with each edge of the graph.
– The entry node has a definition of each edge parameter and each nonlocal variable used in the
program.
– The exit node has an un-definition of each local variable.
14
Data Flow Graph
• Example code: ReturnAverage() from Chapter 3
public static double ReturnAverage(int value[], int AS, int MIN, int MAX ) { /* Function: ReturnAverage Computes the average of all
those numbers in the input array in the positive range [MIN,
MAX]. The maximum size of the array is AS. But, the array size could be smaller than AS in which case the end of input is represented by -999. */
int i, ti, tv, sum;
double av;
i = 0; ti = 0; tv = 0; sum = 0;
while (ti < AS && value[i] != -999) {
ti++;
if (value[i] >= MIN && value[i] <= MAX)
{
tv++;
sum = sum + value[i];
}
i++;
}
if (tv > 0)
av = (double)sum/tv;
else
av = (double) -999;
return (av);
}
Figure 3: A function to compute the average of selected integers in an array.
15
Data Flow Graph
Figure 4: A data flow graph of ReturnAverage() example. 16
Data Flow Terms
• Global c-use: A c-use of a variable x in node i is said to be a global c-use if x has
been defined before in a node other than node i.
– Example: The c-use of variable tv in node 9 (Figure 4) is a global c-use.
• Definition clear path: A path (i – n1 – … nm – j), m ≥ 0, is called a definition clear
path (def-clear path) with respect to variable x
from node i to node j, and
from node i to edge (nm, j),
if x has been neither defined nor undefined in nodes n1 – … nm.
– Example: (2 – 3 – 4 – 6 – 3 – 4 – 6 – 3 – 4 – 5) is a def-clear path w.r.t. tv in Fig. 4.
– Example: (2 – 3 – 4 – 5) and (2 – 3 – 4 – 6) are def-clear paths w.r.t. variable tv from node 2 to 5
and from node 2 to 6, respectively, in Fig. 4.
17
Data Flow Terms
• Global definition: A node i has a global definition of variable x if node i has a definition of x and
there is a def-clear path w.r.t. x from node i to some
node containing a global c-use, or
edge containing a p-use of variable x.
TABLE 1 Def() and c-use() Sets of Nodes in Figure 4 18
TABLE 2 Predicates and p-use() Set of Edges in Figure .4
Data Flow Terms
• Simple path: A simple path is a path in which all nodes, except possibly the first
and the last, are distinct.
– Example: Paths (2 – 3 – 4 – 5) and (3 – 4 – 6 – 3) are simple paths.
• Loop-free paths: A loop-free path is a path in which all nodes are distinct.
• Complete path: A complete path is a path from the entry node to the exit node.
19
Data Flow Terms
• Definition-use path (Du-path): A path (n1 – n2 – … – nj – nk) is a du-path
w.r.t. variable x if node n1 has a global definition of x and either
• node nk has a global c-use of x and (n1 – n2 – … – nj – nk) is a def-clear simple path
w.r.t. x, or
• Edge (nj, nk) has a p-use of x and (n1 – n2 – … – nj – nk) is a def-clear, loop-free path
w.r.t. x.
– Example: Considering the global definition and global c-use of variable tv in nodes 2 and
5, respectively, (2 – 3 – 4 – 5) is a du-path.
– Example: Considering the global definition and p-use of variable tv in nodes 2 and on edge
(7, 9), respectively, (2 – 3 – 7 – 9) is a du-path.
20
Data Flow Testing Criteria
• Seven data flow testing criteria
– All-defs
– All-c-uses
– All-p-uses
– All-p-uses/some-c-uses
– All-c-uses/some-p-uses
– All-uses
– All-du-paths
21
Data Flow Testing Criteria
1. All-defs
– For each variable x and each node i, such that x has a global definition in node i, select a
complete path which includes a def-clear path from node i to
• node j having a global c-use of x, or
• edge (j, k) having a p-use of x.
– Example (partial): Consider tv with its global definition in node 2. Variable tv has a global c-use
in node 5, and there is a def-clear path (2 – 3 – 4 – 5) from node 2 to node 5. Choose a complete
path (1 – 2 – 3 – 4 – 5 – 6 – 3 – 7 – 9 – 10) that includes the def-clear path (2 – 3 – 4 – 5) to
satisfy the all-defs criterion.
22
Data Flow Testing Criteria
2. All-c-uses
– For each variable x and each node i, such that x has a global definition in node i, select
complete paths which include def-clear paths from node i to all nodes j such that there is a
global c-use of x in j.
– Example (partial): Consider variable ti, which has a global definition in 2 and a global c-use
in node 4. From node 2, the def-clear path to 4 is (2 – 3 – 4). One may choose the complete
path (1 – 2 – 3 – 4 – 6 – 3 – 7 – 8 – 10). (There are three other complete paths.)
3. All-p-uses
– For each variable x and each node i, such that x has a global definition in node i, select
complete paths which include def-clear paths from node i to all edges (j, k) such that there is a
p-use of x on (j, k).
– Example (partial): Consider variable tv, which has a global definition in 2 and p-uses on
edges (7, 8) and (7, 9). From node 2, there are def-clear paths to (7, 8) and (7, 9), namely (2 –
3 – 7 – 8) and (2 – 3 – 7 – 9). The two complete paths are: (1 – 2 – 3 – 7 – 8 – 10) and (1 – 2 –
3 – 7 – 9 – 10).
23
Data Flow Testing Criteria
4. All-p-uses/some-c-uses
– This criterion is identical to the all-p-uses criterion except when a variable x has no p-use. If x
has no p-use, then this criterion reduces to the some-c-uses criterion.
– Some-c-uses: For each variable x and each node i, such that x has a global definition in node i,
select complete paths which include def-clear paths from node i to some nodes j such that there
is a global c-use of x in j.
– Example (partial): Consider variable i, which has a global definition in 2. There is no p-use of
i. Corresponding to the global definition of I in node 2, there is a global c-use of I in 6. The def-
clear path from node 2 to 6 is (2 – 3 – 4 – 5 – 6). A complete path that includes the above def-
clear path is (1 – 2 – 3 – 4 – 5 – 6 – 3-7 – 9 – 10).
24
Data Flow Testing Criteria
5. All-c-uses/some-p-uses
– This criterion is identical to the all-c-uses criterion except when a variable x has no c-use. If x has no
global c-use, then this criterion reduces to the some-p-uses criterion.
– Some-p-uses: For each variable x and each node i, such that x has a global definition in node i,
select complete paths which include def-clear paths from node i to some edges (j, k) such that there
is a p-use of x on (j, k).
– Example: Let us obtain paths to satisfy the all-c-uses/some-p-uses criterion with respect to variable
AS. We find just one global definition of AS in node 1. There is no global c-use of AS in Figure 4.
Thus, we consider some p-uses of AS. Corresponding to the global definition of AS in node 1, there
are p-uses of AS on edges (3, 7) and (3, 4), and there are def-clear paths from node 1 to those two
edges, namely, 1-2-3-7 and 1-2-3-4, respectively. There are many complete paths that include those
two def-clear paths. One such example path is given as 1-2-3-4-5-6-3-7-9-10.
25
Data Flow Testing Criteria
6. All-uses
This criterion is the conjunction to the all-p-uses criterion and the all-c-uses criterion.
7. All-du-paths
• For each variable x and for each node i, such that x has a global definition in node i,
select complete paths which include all du-paths from node i
– To all nodes j such that there is a global c-use of x in j, and
– To all edges (j, k) such that there is a p-use of x on (j, k).
26
Comparison of Data Flow Testing Criteria
• Comparison of two testing criteria c1 and c2
– We need a way to compare the two.
• Includes relationship: Given two test selection criteria c1 and c2, c1 includes c2
if for every def/use graph, any set of complete paths of the graph that
satisfies c1 also satisfies c2.
• Strictly includes relationship: Given two test selection criteria c1 and c2, c1
strictly includes c2 provided c1 includes c2 and for some def/use graph, there
is a set of complete paths of the graph that satisfies c2 but not c1.
27
Comparison of Data Flow Testing Criteria…
For example, the all-paths
selection criterion strictly includes
the all-du-paths criterion.
Similarly, the all-c-uses/some-p-
uses criterion strictly includes the
all-defs criterion.
Figure 5: The relationship among DF (data flow) testing criteria [6] (©[1988] IEEE).
28
Feasible Paths and Test Selection Criteria
• Executable (feasible) path
– A complete path is executable if there exists an assignment of values to input
variables and global variables such that all the path predicates evaluate to true.
• Considering the feasibility of paths, the “includes” relationships of Fig. 5
change to Fig. 6.
29
Comparison of Testing Techniques
30
Figure 6: The relationship among the FDF (feasible data flow) testing criteria [6] (©[1988] IEEE).
Comparison of Testing Techniques…
Figure 7: Limitations of different fault detection techniques. 31
Summary
• Data flow is a readily identifiable concept in a program unit.
• Data flow testing
– Static
– Dynamic
• Static data flow analysis
– Data flow anomaly can occur due to programming errors.
– Three types of data flow anomalies
• (Type 1: dd), (Type 2: ur), (Type 3, du)
• Analyze the code to identify and remove data flow anomalies.
• Dynamic data flow analysis
– Obtain a data flow graph from a program unit.
– Select paths based on DF criteria: all-defs, all-c-uses, all-p-uses, all-uses, all-c-
uses/some-p-uses, all-p-uses/some-c-uses, all-du-paths.
– The includes relationship is useful in comparing selection criteria.
32