Given source code of a Java program, the proposed method analyzes and visualizes the program as a data flow graph.. Data flow testing has been known as a key white box testing technique
Trang 1
VIETNAMNATIONALUNIVERSITY, HANOI TINIVERSITY OF ENGINEERING AND TECHNOLOGY
PHAM VAN CUONG
A METHOD AND TOOL SUPPORT FOR AUTOMATED DATA FLOW TESTING OF JAVA
PROGRAMS
Major: Computer Science
Supervisor: Dr Pham Ngoc Thung
Tlanoi 2014
Trang 2ACKNOWLEDGEMENT
Furst of all, I would like to express my sincere gratitude lo my supervisor
Dr Pham Ngoc Hung, University of Engimeering and ‘echnology, Viet Nam
National University (VNU), Ha Noi for his enthusiastic guidance, warm
encouragement and helpful research experiences
T am gratefiul to thank all the teachers in University of Engineering and Technology VNU who provide invaluable knowledge and life skills for me during the four academic years
I would like to also thank my fiends in K18-CS class who helped me during the four academic years
Last, but not least, my family is really the biggest motivation for me My parents
always encourage me when I have stress and difficulty 1 would like to send
them great love and gratefulness
Trang 3AUTHORSHIP
Thereby declare that this submission is my own work and to the best of my
knowledgeit contains no materials previously published or written by another person, or substanlial proportions of material which have been accepted [or the award of any other degreeor diploma at University of Engineering and
Technology (VET/Coltech) or any othereducational institution, except where
due acknowledgement is made in the thesis Anycontribution made to the
research by others, with whom | have worked at UE1/Coltechor elsewhere, is explicitly acknowledged in the thesis ] also declare that the intellectualcontent
of this thesis is the product of my own work, except to the extent thal assistance
from others in the project's design and conception or in style, presentation and
linguislicexpression is acknowledged
Signature
Trang 41.4 The goal of research
1.5 The organization of this thesis
2.3.1 Overview of dynamic data flow testing
2.3.2 Data flow graph
2.3.3 Data Flow Terms
Chapter 3: A Method for Data Flow Testing seq6xusbt Đồ
3.2 Test path generation
Trang 5LIST FIGURES AND TABLES
Fig.1.1 Limitation of different fault detection techniques[10
Fig.2.1 State transition diagram of a program variable[2]
Fig.3.1 Basic components of control flow
Fig.3.2 Structure of control statements
Fig.3.3 A source code and its data flow graph
Fig.3.4 Relative Strength of Testing Strategies
Fig.3.5 Process of generating test input data for data flow testing
Fig.3.6 Model of test oracle use an alternative program —
Fig 3.7 Methods written by two different ways return the same results se 29)
Fig.4.1 Open a file java i ian
Fig.4.2 Display data flow ofa a tần program
Fig 4.3 Generating test data ait
Fig.4.4 The test results of a Java program
Fig.4.5 The source code of the programmer Ị ji
EFig.4.6 The source code of the programmer 2
Fig.4.7 Data flow of the application to calculate bill
Table 3.1 Def and c-use Sets of Nodes in Figure 3.3 compe
Table 3.2 Predicates and P-use Set of Edges in Figure: 3 3
Table 3.3 A test case of U is created from the path p
Table 3.4 The results of the two methods
Table 4.1 Billing rules
Table 4.2 The results of testing bill appligatian
Trang 6
Abstract
Thịs thesis proposes a method and a tool support for automated data flow
testing of Java programs The key purpose of this method is to detect improper
uses of data values due to coding errors Given source code of a Java program,
the proposed method analyzes and visualizes the program as a data flow graph All test paths for covering all definition-use pairs of all variables are then
generated A test case corresponding to each generated test path is produced by
identifying values to the input parameters so that the test path is executable The expected outputs of these test cases are identified automatically An implemented tool supporting the improved method and experimental results are also presented This tool is promising to be applied in practice
Key words: software testing, data flow testing, white-box testing, dataflow anomaly, data flow coverage
Trang 7Chapter 1: Introduction
1.1 Introduce to data flow testing
The main goals of software testing are to reveal bugs and to ensure that the
system being developed complies with the customer’s requirements To make testing effective, it is recommended that test planning/development begin at the
onset of the project Software testing techniques can be divided into 2 kinds: black box and white box techniques Black box testing is mainly a validation
technique that checks to see if the product meets the customer’s requirements
On the other hand, white box testing is a verification technique which uses the
source code to guide the selection of test data Furthermore, software testing has been considered as the major solution in improving quality of software systems Currently, software companies focus only on the black box testing techniques in order to validate whether software products meet the customer’s requirements
By this approach, they only detect the errors/mistakes which can be observed by users, As a result, all potential errors of program code can be not detected Moreover, detecting such errors has been recognized as a key difficult and expensive task in practice In addition, the testers in charge this task are required high level knowledge and skills for analyzing source code These issues are still open problems in software companies, especially in Vietnam
Data flow testing has been known as a key white box testing technique that can be used to detect improper uses of data values due to coding errors [4] These errors are inadvertently introduced in a program by programmers Forinstance, a software programmer might use a variable without defining it, or he/she may define a variable, but not initialize it and then uses that variable in a predicate (e.g int x; if(x==100);) [4] The problem of errors in variables is
common problems of programmers
Each variable is classified as either a definition occurrence or a use occurrence A definition occurrence of a variable is where a value
Trang 8isassociated with the variable A use occurrence of a variable is where the
value of the variable is referred Each use occurrence is further classified as a computational use (c-use) or a predicate use (p-use) If the value of the variable
is used to decide whether a predicate is true for selecting execution paths, the occurrence is called a predicate use Otherwise, the occurrence is
called a computational use Their criteria require that test data to be included which cause the traversal of sub-paths from a variable definition to either
some or all of the p-uses, c-uses, or their combination
We may note that there is much similarity between control flow testing and
data flow testing Moreover, there is a key difference between the twoapproaches The similarities stem from the fact that both approaches identify program paths and emphasize on generating test cases from those program paths Thedifference between the two lies in the fact that control flow test selection criteriaare used in the former, whereas data flow test selection criteria
are used in thelatter approach
1.2 Applications of data flow testing
The primary purpose of dynamic data-flow testing is to uncover possible
bugs in data usage during the execution of the code To achieve this, test cases
are created which trace every definition to each of its use and every use is traced
to each of its definition
Ntafos [10] has reported on the results of an experiment comparing with the effectiveness of three test selection techniques The data flow testing, control flow testing, and random testing detected 90%, 85.5%, and 79.5% respectively,
of the known defects Furthermore, Fig.1.1 shows the limitation of different fault
detection techniques [10] These facts imply that data flow testing is one of the
most effective methods for examining structure of programs.
Trang 9Total number of faults in a program
Random = Control flow- Dataflow- New testing
testing based testing basedtesting —_ techniques
Fig.1.1 Limitation of different fault detection techniques[10]
1,3 Related work
Much of the formalization of define/use testing was done in the early 1980s [1,2]; the definitions in this chapter are compatible with those in [1,2], an article
which summarizes most of define/use testing theory
Firstly, with regards to test data, Tonella [21] performed the unit testing of classes using genetic algorithm In this approach test cases are generated for unit
testing of classes using algorithm McMinn and Holcombe [20] proposed a solution for the state problem in evolutionary testing using ant colony
model,Cheon et al [18] proposed automation of Java program testing at unit level using evolutionary approaches However, the types of data of these methods are limited
Secondly, with regards to generate expected output is also known as test
oracle, the oracles so far exactly compared the expected outputs with the actual However, this is not always feasible Statistical Methods [13] and Artificial neural networks (ANNs)[14] have been used to identify the expected output
6
Trang 10These methods however, are not always feasible and require the implementation
under test (UT)
Applying metamorphic testing to situations in which there is no test oracle
has previously been studied by Chen et al [22] In some cases, these works have
looked at situations in which there cannot be an oracle for a particular application; in others, the work has considered the case in which the oracle is simply absent or difficult to implement
Although some tools supporting data flow testing such as BPAS - ATCGS (Basic Program Analyzer System Automatic Test Case Generation System) [8],
JaBUTi [9], DFC (Data Flow Coverage) [3],etc, these tools only generate all paths for covering given source code In fact, we need a tool that assists the tester in creating test data [5] that include expected output Some free versions only allow testing the programs that are fixed in these tools and they are difficult
to be extended in order to satisfy the specific data flow testing purposes of a certain software company
1.4 The goal of research
One of the major difficulties in software testing is the automatic generation
of test data and expected outputs In this thesis, wewill present a method to create test data and expected outputs bases on data flow testing of Java programs Given source code of a program, this method analyzes and visualizes the program as a data flow graph All test paths corresponding to all paths of the data flow graph for covering all definition-use pairs of all variables in the
program are then generated All test cases of generated test paths are produced
by giving values to the input parameters The set of the values to the input
parameters and expected outputs of the produced test cases are also generated
automatically In order to show the practical usefulness of the proposed method,
a tool supports the method is implemented The obtained experimental results by
applying this tool for some typical programs are completely reliable in detecting
Trang 11all errors about using data variables In addition, this tool is a free version, open
source, and promising to be applied in practice
1.5 The organization of this thesis
The thesis is organized as follows.We first review some background in
Sect 2 Chapter 3 describes a method for data flow testing of Java programs Chapter 4 shows the implemented tool and experimental results Finally, we conclude the thesis in Sect 5
Trang 12Chapter 2: Theory of Data Flow Testing
2.1 Basic idea
A program unit, such as a function, accepts input values, performs
computations while assigning new values to local and global variables, and,
finally, produces output values Therefore, one can imagine a kind of “flow” of data values between variables along a path of program execution A data value computed in a certain step of program execution is expected to be used in a later step For example, a program may open a file, thereby obtaining a value for a file
pointer; in a later step, the file pointer is expected to be used Intuitively, if the later use of the file pointer is never verified, then we do not know whether or not
the earlier assignment of value to the file pointer variable is all right Sometimes,
a variable may be defined twice without a use of the variable in between One
may wonder why the first definition of the variable is never used
There are two motivations for data flow testing as follows First, a
memory location corresponding to a program variable is accessed in a desirable
way For example, a memory location may not be read before writing into the location Second, it is desirable to verify the correctness of a data value generated for a variable This is performed by observing that all the uses of the value produce the desired results
Data flow testing can be performed at two conceptual levels: static data flow testing and dynamic data flow testing As the name suggests, static data flow testing is performed by analyzing the source code, and it does not involve actual execution of source code On the other hand, dynamic data flow testing
involves identifying program paths from source code based on a class of data
flow testing criteria
In this chapter, first we study the concept of data flow anomaly as identified by Fosdick and Osterweil [17] Next, we discuss dynamic data flow
testing in detail.
Trang 132.2 Static data flow testing
Static data flow testing is known as the data flow anomaly An anomaly is
a deviant or abnormal way of doing something For example, it is an abnormal
situation to successively assign two values to a variable without using the first
value Similarly, it is abnormal to use a value of a variable before assigning a
value to the variable Another abnormal situation is to generate a data value and
never use it,
The three abnormal situations are called type 1, type 2,and type 3 anomalies [1] These anomalies could be manifestations of potential
A: Abnormal
Fig.2.1 State transition diagram of a program variable[2]
© Type 1: Defined and then defined
Eg x=f(y);
x=f(2);
10
Trang 14® Typc2: Undefined but referenced
Rg int x-0, y-0;
inl w:
x —x—y —w, /* w has not been defined by the programmer */
*® Type 3: Defined but not referenced For example, consider x = f(x, y) If
x is not used subsequently, we have a Type 3 anomaly
Iluang [16] introduced the idea of “states” of program variables to identify
data flow anomaly Now it is uscful to make an association between the type 1,
type 2,and type 3 anomalies and the state transition diagram shown in Fig.2.1
The type 1, type 2,and lype 3 anomalies are denoicd by the action sequences ded,
ur ,and du, respectively, in Kig.2.1
Data flow anomaly can be detected by using the idea of program
instrumentation Intuitively, program instrumentation means incorporating additional code in a program to monitor its execution status or example, we
can write additional cade in a program to monitor the sequence of states, namely
the U, D, R, and A, traversed by a variable If the state sequence contains the dd,
ur and du subscquence, then a dala flow anomaly is said lo have occurred
Why Static Data-flow testing is net enough?
Static [ata-flow testing will fail in situations where the state of a data
variable cannot be determined by just analyzing the code This is possible when
the data variable is used as an index for a collection of data elements Kor
example, in case of arrays, the index might be generated dynamically during execution hence we can’t guarantee what the state of the array element is which
is referenced by that index Moreover, the static data-flow testing might denote a certain piece of code to be anomalous which is never executed and hence not
completely anomalous
1
Trang 152.3 Dynamic data flow testing
2.3.1 Overview of dynamic data flow testing
In the process of writing code, a programmer manipulates variables in
order to achieve the desired computational effect Variable manipulation occurs
in several ways, such as initialization of the variable, assignment of a new value
to the variable, computing a value of another variable using the value of the variable, and controlling the flow of program execution
Rapps and Weyuker [1] convincingly tell us that one should not feel confident that a variable has been assigned the correct value if no test case causes theexecution of a path from the assignment to a point where the value of the variableis used In the above motivation for data flow testing, (i) assignment
of a correctvalue means whether or not a value for the variable has been
correctly generatedand (ii) use of a variable refers to further generation of values for the same or othervariables and/or control of flow A variable can be used in a predicate, that is, acondition, to choose an appropriate flow of control
The above idea gives us an indication of the involvement of certain kinds of
program paths in data flow testing Data flow testing involves selecting entry—
exit paths with the objective of covering certain data definition and use patterns, commonly known as data flow testing criteria Specifically, certain program paths are selected on the basis of data flow testing criteria
© Draw a data flow graph from a program
© Select one or more data flow testing criteria
¢ Identify paths in the data flow graph satisfying the selection criteria,
© Derive path predicate expressions from the selected paths and solve those
expressions to derive test input
¢ Based on these value inputs, we identify the expected outputs
12
Trang 163:32 Data flow graph
A data flow graph is drawn with the objective of identifying data definitions
and their uses as motivated in the preceding chapter Each occurrence of a data variable is classified as follows
Definition: A statement storing a value in a memory location of a variable creates a definition (def) of the variable[1]
Use: A statement drawing a value from the memory location of a variable
is a use of the currently active definition of the variable In particular, when the variable appears on the right-hand side of an assignment
statement it is called a computational use (c-use), when the variable
appears in the predicate of the conditional statement it is called a predicate
A set of p-uses is associated with each edge of the graph
The entry node has a definition of each parameter and each nonlocal variable which occurs in the subprogram
The exit node has an undefinition of each local variable
Data Flow Terms
Avariable defined in a statement is used in another statement which may occur immediately or several statements after the definition We are interested in
finding paths that include pairs of definition and use of variables In this chapter,
we explain a family of path selection criteria that allow us to select paths with
varying strength Note that for every feasible path we can generate a test case In
the following, first we explain a few terms, and then we explain a few selection
criteria using those terms
13
Trang 17«© Global c-use: A c-use of a variable x in node i is said to be a global c-use
if x has been defined before in a node other than node i [1]
® Definition Clear Path:Apath(in,.ny, j ),m >0, is called a definition clear
path (def-clear path) with respect to variable x from node 1 to node j and from node i to edge (nm, j ) [1]
* Global Definition: A node i has a global definition of a variable x if node
i has a definition of x and there is a def-clear path with respect to x from node i to some node containing a global c-use or edge containing a p-use
® Du-path: Apath(n, nạ, n, nụ ) is a definition-use path (du-path) with
respect to (w.r.t) variable x if node nl has a global definition of x and
eithernode ny, has a global c-use of x and (m, m, , nj, My ) is a def-clear
simple path w.r.t x or edge (nj ,nk )hasap-useof x and (mị, nạ, nj ) is a def-clear loop-free path w.r.t x[1]
2.3.3 Data Flow Testing Criteria
In this chapter, we explain seven types of data flow testing criteria These criteria are based on two fundamental concepts, namely, definitions and uses both c-uses and p-uses of variables
® All-defs: For each variable x and for each node i such that x has a global
definition in node i , select a complete path which includes a def-clear path from node i to node j having a global c-use of x or edge (j k) having
ap-use of x[1]
14
Trang 18All-c-uses: For each variable x and for each node i , such that x has a
global definition in node i , select complete paths which include def-clear
paths from nods i to all nodes j such that there is a global o-use of x mj E1
All-p-uses: For each variahle x and for each node i such that x has a
global definition in node i , select complete paths which include def-clear
paths from node i to all edges (j k) such that there is a p-use of x on edge
09H]
All-p-uses/Seme-c-uses: ‘This criterion is identical to the all-p-uses
criterion except when a variable x has no p-use If x has no p-use, then this criterion reduces Lo the some-c-uses criterion explained below [1]
© Some-c-uses: For each variable x and for each node i such that x
has a global definition in node i , select complete paths which
include def-clear paths from node i to some nodes j such that there
1s a global c-use of x in node j
All-c-uses/Some-p-uses: ‘This criterion is identical to the all-c-uses criterion except when a variable x has no global c-use If x has no global
c-use, then this criterion reduces to the some-p-uses criterion explained
below [1]
o For each variable x and for each node i such that x has a global
definition in nodc 1, sclect complete paths which include def-clear
paths from node i to some edges (j ,k) such that there is a p-use of x
on edge (j k)
AlLuses: This criterion is the conjunction of the all-p-uses criterion and
the all-c-uses criterion discussed abave|1 |
All-du-paths: For cach vanable x and for cach node i such that x has a
global definition in node i , select complete paths which include all du-
paths from node i to all nodes j such that there is a global c-use of x in j and to all edges (jk) such that there is a p-use of x on (jk) [1]
15
Trang 192.4 Summary
Flow of data in a program can be visualized by considering the fact that a
program unit accepts input data, transforms the input data through a sequence of computations, and, finally, produces the output data Therefore, one can imagine data values to be flowing from one assignment statement defining a variable to
another assignment statement or a predicate where the value is used
The program path is a fundamental concept in testing One test case can begenerated from one executable path The number of different paths selected forexecution is a measure of the extent of testing performed Path selection based onstatement coverage and branch coverage lead to a small number of paths beingchosen for execution Therefore, there exists a large gap between control flowtesting and exhaustive testing The concept of data flow testing gives
us a way tobridge the gap between control flow testing and exhaustive testing
The concept of data flow testing gives us new selection criteria forchoosing more program paths to test than what we can choose by using theidea of control flow testing Specifically, the data flow test selection criteria areall-du-paths, all-defs, all-c-uses, all-p-uses, all-uses, all-c-uses/some-p-uses, andall-p-uses/some-c-uses To compare two selection criteria, the concept of a strictlyincludes relationship is found to be useful
Chapter 3: A Method for Data Flow Testing
16
Trang 20This chapter presents a method for data flow testing of Java programs
Given source code of a Java program, this method analyzes and visualizes the program as a data flow graph Next, it finds all paths in the generated data flow graph so that each path covers all definition-use pairs of all variables in the program Finally, all test cases corresponding to the generated test paths are
created by giving values to the input parameters The expected outputs of these
test cases are also computed automatically
Let U be a program, V = {vj, V2, Va} be a set of variables of U and P be a set of du-path paths
3.1 Data Flow Graph
Data flow graph(DFG) is a directed graph G = {N,E}, where N is a finite
set of nodes and each node represents a c-use or def,E is a finite set of directed edges and each edge represents a p-use,no, nEN are entry node and exit node respectively Let Py, be a set of complete path of G
First, U can be uniquely decomposed into a set of basic blocks, where a basic block is a apart of code that executes without branching Each a basic
block is corresponding to a node in the graph G Directed edges of Gwhere they connect the nodes together are to follow the rules as Fi.g.3.1 and Fi.g.3.2
e Oo 2 ý ¢
Start point process block decided point connected point end point
Fig.3.1 Basic components of control flow
17
Trang 21sequence It switch while c do do while ¢
Fig.3.2 Structure of control statements
Definition 1 (Def) A definition of a variable v € V at node n EN, denoted
Def(v, n), where Def(v,n) true if variable v is defined at node n and Def(v,n) =
false otherwise
Definition 2.(C-use) A computation of a variable v € V at node n EN, denoted C-use(v,n), where C-use(v,n)= true if variable v is used to compute at node n
and C-use(v,n)=false otherwise
Definition 3.(P-use) A predicate of variable v € V at edge e € E, denoted P-
USE\v,e), where P-use(v,e)=true if the variable v is used at edge e and P-
use(v,e)=false otherwise
Definition 4.(Def-c-path) A path pín, n› n„) of variable v € V, if Def(n,v)=false where 1<i<m, then Def-c-path (p,v)=true or the p is a def-clear- path of variable v and Def-c-path (p)=false otherwise
Definition 5.(Pc) For each v EV, Ym; n EN, if def(v,m)=true and C- use(v,n)=true, existing a set of paths, denoted Pc, where Vp € Pc has first node
is m and last node is n, then if Def-c-path(v,p)=true, p is a dupath
18
Trang 22Definition 6.(Pp) For eachy EV, Ym EN and ve EE if def(v,m) = true and
P-use(v,e)=true, existing a set of paths, denoted Pp, where Vp € Pp has first node is m and last edge is e, then if Def-c-path(v, p)=true, p is a dupath
For example, source code of a program and its data flow graph are shown in
Fig 3.3 By the definition 2, def(x,0) = true and def(x,3)=false Similarly, by the definition 3, C-use(x, 0)=false and C-use(x,3)=true By the definition 4, P-
use(x, {0,2})=false and P-use(x, {2,3})=true
public int getValue(int x, inty) 0) xy
Fig.3.3 A source code and its data flow graph
The table 3.1 shows all the definitions and c-usesappearing in the data flow
graph of Figure 3.3, Def(i) denotes the setof variables which have definitions in node i Similarly, C-use(i) denotes the set of variables which have c-uses in node i.The table 3.2 showsall the predicates and p-uses appearing in the data flow
graph of Figure 3.3
19