Given source code of a Java program, the proposed method analyzes and visualizes the program as a data flow graph.. Thedifference between the two lies in the fact that control flow test se
Trang 1VIETNAMNATIONALUNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY
PHAM VAN CUONG
A METHOD AND TOOL SUPPORT FOR
AUTOMATED DATA FLOW TESTING OF JAVA
PROGRAMS
Major: Computer Science
Supervisor: Dr Pham Ngoc Hung
Hanoi 2014
Trang 2ACKNOWLEDGEMENT
First of all, I would like to express my sincere gratitude to my supervisor
Dr Pham Ngoc Hung, University of Engineering and Technology, Viet Nam National University (VNU), Ha Noi for his enthusiastic guidance, warm encouragement and helpful research experiences
I am grateful to thank all the teachers in University of Engineering and Technology, VNU who provide invaluable knowledge and life skills for me during the four academic years
I would like to also thank my friends in K18-CS class who helped me during the four academic years
Last, but not least, my family is really the biggest motivation for me My parents always encourage me when I have stress and difficulty I would like to send them great love and gratefulness
Trang 3I hereby declare that this submission is my own work and to the best of my knowledgeit contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degreeor diploma at University of Engineering and Technology (UET/Coltech) or any othereducational institution, except where due acknowledgement is made in the thesis Anycontribution made to the research by others, with whom I have worked at UET/Coltechor elsewhere, is explicitly acknowledged in the thesis I also declare that the intellectualcontent
of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguisticexpression is acknowledged
Signature: ………
Trang 4Table of content
Abstract 3
Chapter 1: Introduction 4
1.1.Introduce to data flow testing 4
1.2 Applications of data flow testing 5
1.3 Related work 6
1.4 The goal of research 7
1.5 The organization of this thesis 8
Chapter 2: Theory of Data Flow Testing 9
2.1 Basic idea 9
2.2 Static data flow testing 10
2.3 Dynamic data flow testing 12
2.3.1 Overview of dynamic data flow testing 12
2.3.2 Data flow graph 13
2.3.3 Data Flow Terms 13
2.3.3 Data Flow Testing Criteria 14
2.4 Summary 16
Chapter 3: A Method for Data Flow Testing 16
3.1 Data Flow Graph 17
3.2 Test path generation 21
3.3 Test case generation 24
3.3.1 Test data 24
3.3.2 Expected output 28
Chapter 4: Experiment and Discussion 31
4.1 Experiment 31
4.2 Discussion 37
Chapter 5: Conclusion 39
References 41
Trang 5LIST FIGURES AND TABLES
Fig.1.1 Limitation of different fault detection techniques[10] 6
Fig.2.1 State transition diagram of a program variable[2] 10
Fig.3.1 Basic components of control flow 17
Fig.3.2 Structure of control statements 18
Fig.3.3 A source code and its data flow graph 19
Fig.3.4 Relative Strength of Testing Strategies 23
Fig.3.5 Process of generating test input data for data flow testing 25
Fig.3.6 Model of test oracle use an alternative program 29
Fig 3.7 Methods written by two different ways return the same results 29
Fig.4.1 Open a file java 31
Fig.4.2 Display data flow of a Java program 32
Fig.4.3 Generating test data 32
Fig.4.4 The test results of a Java program 33
Fig.4.5 The source code of the programmer 1 34
Fig.4.6 The source code of the programmer 2 34
Fig.4.7 Data flow of the application to calculate bill 36
Table 3.1 Def and c-use Sets of Nodes in Figure 3.3 20
Table 3.2 Predicates and P-use Set of Edges in Figure 3.3 20
Table 3.3 A test case of U is created from the path p 28
Table 3.4 The results of the two methods 30
Table 4.1 Billing rules 33
Table 4.2 The results of testing bill application 35
Trang 6Abstract
This thesis proposes a method and a tool support for automated data flow testing of Java programs The key purpose of this method is to detect improper uses of data values due to coding errors Given source code of a Java program, the proposed method analyzes and visualizes the program as a data flow graph All test paths for covering all definition-use pairs of all variables are then generated A test case corresponding to each generated test path is produced by identifying values to the input parameters so that the test path is executable The expected outputs of these test cases are identified automatically An implemented tool supporting the improved method and experimental results are also presented This tool is promising to be applied in practice
Key words: software testing, data flow testing, white-box testing,
dataflow anomaly, data flow coverage
Trang 7Chapter 1: Introduction
1.1 Introduce to data flow testing
The main goals of software testing are to reveal bugs and to ensure that the system being developed complies with the customer‟s requirements To make testing effective, it is recommended that test planning/development begin at the onset of the project Software testing techniques can be divided into 2 kinds: black box and white box techniques Black box testing is mainly a validation technique that checks to see if the product meets the customer‟s requirements
On the other hand, white box testing is a verification technique which uses the source code to guide the selection of test data.Furthermore, software testing has been considered as the major solution in improving quality of software systems Currently, software companies focus only on the black box testing techniques in order to validate whether software products meet the customer‟s requirements
By this approach, they only detect the errors/mistakes which can be observed by users As a result, all potential errors of program code can be not detected Moreover, detecting such errors has been recognized as a key difficult and expensive task in practice In addition, the testers in charge this task are required high level knowledge and skills for analyzing source code These issues are still open problems in software companies, especially in Vietnam
Data flow testing has been known as a key white box testing technique that can be used to detect improper uses of data values due to coding errors [4] These errors are inadvertently introduced in a program by programmers Forinstance, a software programmer might use a variable without defining it, or he/she may define a variable, but not initialize it and then uses that variable in a predicate (e.g int x; if(x==100);) [4] The problem of errors in variables is common problems of programmers
Each variable is classified as either a definition occurrence or a use occurrence A definition occurrence of a variable is where a value
Trang 8isassociated with the variable A use occurrence of a variable is where the value of the variable is referred Each use occurrence is further classified as a computational use (c-use) or a predicate use (p-use) If the value of the variable
is used to decide whether a predicate is true for selecting execution paths, the occurrence is called a predicate use Otherwise, the occurrence is called a computational use Their criteria require that test data to be included which cause the traversal of sub-paths from a variable definition to either some or all of the p-uses, c-uses, or their combination
We may note that there is much similarity between control flow testing and data flow testing Moreover, there is a key difference between the twoapproaches The similarities stem from the fact that both approaches identify program paths and emphasize on generating test cases from those program paths Thedifference between the two lies in the fact that control flow test selection criteriaare used in the former, whereas data flow test selection criteria are used in thelatter approach
1.2 Applications of data flow testing
The primary purpose of dynamic data-flow testing is to uncover possible bugs in data usage during the execution of the code To achieve this, test cases are created which trace every definition to each of its use and every use is traced
to each of its definition
Ntafos [10] has reported on the results of an experiment comparing with the effectiveness of three test selection techniques The data flow testing, control flow testing, and random testing detected 90%, 85.5%, and 79.5% respectively,
of the known defects Furthermore, Fig.1.1 shows the limitation of different fault detection techniques [10] These facts imply that data flow testing is one of the most effective methods for examining structure of programs
Trang 9Fig.1.1 Limitation of different fault detection techniques[10]
1.3 Related work
Much of the formalization of define/use testing was done in the early 1980s [1,2]; the definitions in this chapter are compatible with those in [1,2], an article which summarizes most of define/use testing theory
Firstly, with regards to test data, Tonella [21] performed the unit testing of classes using genetic algorithm In this approach test cases are generated for unit testing of classes using algorithm McMinn and Holcombe [20] proposed a solution for the state problem in evolutionary testing using ant colony model.Cheon et al [18] proposed automation of Java program testing at unit level using evolutionary approaches However, the types of data of these methods are limited
Secondly, with regards to generate expected output is also known as test oracle, the oracles so far exactly compared the expected outputs with the actual However, this is not always feasible.Statistical Methods [13] and Artificial neural networks (ANNs)[14] have been used to identify the expected output
Trang 10These methods however, are not always feasible and require the implementation under test (IUT)
Applying metamorphic testing to situations in which there is no test oracle has previously been studied by Chen et al [22] In some cases, these works have looked at situations in which there cannot be an oracle for a particular application; in others, the work has considered the case in which the oracle is simply absent or difficult to implement
Although some tools supporting data flow testing such as BPAS - ATCGS (Basic Program Analyzer System Automatic Test Case Generation System) [8], JaBUTi [9], DFC (Data Flow Coverage) [3],etc, these tools only generate all paths for covering given source code In fact, we need a tool that assists the tester in creating test data [5] that include expected output Some free versions only allow testing the programs that are fixed in these tools and they are difficult
to be extended in order to satisfy the specific data flow testing purposes of a certain software company
1.4 The goal of research
One of the major difficulties in software testing is the automatic generation
of test data and expected outputs In this thesis, wewill present a method to create test data and expected outputs bases on data flow testing of Java programs Given source code of a program, this method analyzes and visualizes the program as a data flow graph All test paths corresponding to all paths of the data flow graph for covering all definition-use pairs of all variables in the program are then generated All test cases of generated test paths are produced
by giving values to the input parameters The set of the values to the input parameters and expected outputs of the produced test cases are also generated automatically In order to show the practical usefulness of the proposed method,
a tool supports the method is implemented The obtained experimental results by applying this tool for some typical programs are completely reliable in detecting
Trang 11all errors about using data variables In addition, this tool is a free version, open source, and promising to be applied in practice
1.5 The organization of this thesis
The thesis is organized as follows.We first review some background in Sect 2 Chapter 3 describes a method for data flow testing of Java programs Chapter 4 shows the implemented tool and experimental results Finally, we conclude the thesis in Sect 5
Trang 12Chapter 2: Theory of Data Flow Testing
2.1 Basic idea
A program unit, such as a function, accepts input values, performs computations while assigning new values to local and global variables, and, finally, produces output values Therefore, one can imagine a kind of “flow” of data values between variables along a path of program execution A data value computed in a certain step of program execution is expected to be used in a later step For example, a program may open a file, thereby obtaining a value for a file pointer; in a later step, the file pointer is expected to be used Intuitively, if the later use of the file pointer is never verified, then we do not know whether or not the earlier assignment of value to the file pointer variable is all right Sometimes,
a variable may be defined twice without a use of the variable in between One may wonder why the first definition of the variable is never used
There are two motivations for data flow testing as follows First, a memory location corresponding to a program variable is accessed in a desirable way For example, a memory location may not be read before writing into the location Second, it is desirable to verify the correctness of a data value generated for a variable This is performed by observing that all the uses of the value produce the desired results
Data flow testing can be performed at two conceptual levels: static data flow testing and dynamic data flow testing As the name suggests, static data flow testing is performed by analyzing the source code, and it does not involve actual execution of source code On the other hand, dynamic data flow testing involves identifying program paths from source code based on a class of data flow testing criteria
In this chapter, first we study the concept of data flow anomaly as identified by Fosdick and Osterweil [17] Next, we discuss dynamic data flow testing in detail
Trang 132.2 Static data flow testing
Static data flow testing is known as the data flow anomaly An anomaly is
a deviant or abnormal way of doing something For example, it is an abnormal situation to successively assign two values to a variable without using the first value Similarly, it is abnormal to use a value of a variable before assigning a value to the variable Another abnormal situation is to generate a data value and never use it
The three abnormal situations are called type 1, type 2,and type 3 anomalies [1] These anomalies could be manifestations of potential programming errors
Fig.2.1 State transition diagram of a program variable[2]
Type 1: Defined and then defined
Eg x=f(y);
x=f(z);
Trang 14 Type 2: Undefined but referenced
Eg int x=0, y=0;
int w;
x = x – y – w; /* w has not been defined by the programmer */
Type 3: Defined but not referenced For example, consider x = f(x, y) If
x is not used subsequently, we have a Type 3 anomaly
Huang [16] introduced the idea of “states” of program variables to identify data flow anomaly Now it is useful to make an association between the type 1, type 2,and type 3 anomalies and the state transition diagram shown in Fig.2.1
The type 1, type 2,and type 3 anomalies are denoted by the action sequences dd,
ur ,and du, respectively, in Fig.2.1
Data flow anomaly can be detected by using the idea of program instrumentation Intuitively, program instrumentation means incorporating additional code in a program to monitor its execution status For example, we can write additional code in a program to monitor the sequence of states, namely
the U, D, R, and A, traversed by a variable If the state sequence contains the dd,
ur ,and du subsequence, then a data flow anomaly is said to have occurred
Why Static Data-flow testing is not enough?
Static Data-flow testing will fail in situations where the state of a data variable cannot be determined by just analyzing the code This is possible when the data variable is used as an index for a collection of data elements For example, in case of arrays, the index might be generated dynamically during execution hence we can‟t guarantee what the state of the array element is which
is referenced by that index Moreover, the static data-flow testing might denote a certain piece of code to be anomalous which is never executed and hence not completely anomalous
Trang 152.3 Dynamic data flow testing
2.3.1 Overview of dynamic data flow testing
In the process of writing code, a programmer manipulates variables in order to achieve the desired computational effect Variable manipulation occurs
in several ways, such as initialization of the variable, assignment of a new value
to the variable, computing a value of another variable using the value of the variable, and controlling the flow of program execution
Rapps and Weyuker [1] convincingly tell us that one should not feel confident that a variable has been assigned the correct value if no test case causes theexecution of a path from the assignment to a point where the value of the variableis used In the above motivation for data flow testing, (i) assignment
of a correctvalue means whether or not a value for the variable has been correctly generatedand (ii) use of a variable refers to further generation of values for the same or othervariables and/or control of flow A variable can be used in a predicate, that is, acondition, to choose an appropriate flow of control
The above idea gives us an indication of the involvement of certain kinds of program paths in data flow testing Data flow testing involves selecting entry–exit paths with the objective of covering certain data definition and use patterns, commonly known as data flow testing criteria Specifically, certain program paths are selected on the basis of data flow testing criteria
Draw a data flow graph from a program
Select one or more data flow testing criteria
Identify paths in the data flow graph satisfying the selection criteria
Derive path predicate expressions from the selected paths and solve those expressions to derive test input
Based on these value inputs, we identify the expected outputs
Trang 162.3.2 Data flow graph
A data flow graph is drawn with the objective of identifying data definitions and their uses as motivated in the preceding chapter Each occurrence of a data variable is classified as follows:
Definition: A statement storing a value in a memory location of a variable
creates a definition (def) of the variable[1]
Use: A statement drawing a value from the memory location of a variable
is a use of the currently active definition of the variable In particular, when the variable appears on the right-hand side of an assignment statement it is called a computational use (c-use), when the variable appears in the predicate of the conditional statement it is called a predicate use (p-use) [1]
A data flow graph is a directed graph constructed as follows:
A sequence of definitions and c-uses is associated with each node of the graph
A set of p-uses is associated with each edge of the graph
The entry node has a definition of each parameter and each nonlocal variable which occurs in the subprogram
The exit node has an undefinition of each local variable
2.3.3 Data Flow Terms
Avariable defined in a statement is used in another statement which may occur immediately or several statements after the definition We are interested in finding paths that include pairs of definition and use of variables In this chapter,
we explain a family of path selection criteria that allow us to select paths with varying strength Note that for every feasible path we can generate a test case In the following, first we explain a few terms, and then we explain a few selection criteria using those terms
Trang 17 Global c-use: A c-use of a variable x in node i is said to be a global c-use
if x has been defined before in a node other than node i [1]
Definition Clear Path:Apath(i,n1,nm, j ), m ≥0, is called a definition clear path (def-clear path) with respect to variable x from node i to node j and from node i to edge (nm, j ) [1]
Global Definition: A node i has a global definition of a variable x if node
i has a definition of x and there is a def-clear path with respect to x from node i to some node containing a global c-use or edge containing a p-use
of variable x[1]
Simple Path: A simple path is a path in which all nodes, except possibly
the first and the last, are distinct[1]
Loop-Free Path: A loop-free path is a path in which all nodes are
distinct[1]
Complete Path: A complete path is a path from the entry node to the exit
node[1]
Du-path: Apath(n1, n2, nj, nk ) is a definition-use path (du-path) with
respect to (w.r.t) variable x if node n1 has a global definition of x and
eithernode nk has a global c-use of x and (n1, n2, , nj, nk ) is a def-clear
simple path w.r.t x or edge (nj ,nk )hasap-useof x and (n1, n2, , nj ) is a def-clear loop-free path w.r.t x[1]
2.3.3 Data Flow Testing Criteria
In this chapter, we explain seven types of data flow testing criteria These criteria are based on two fundamental concepts, namely, definitions and uses both c-uses and p-uses of variables
All-defs: For each variable x and for each node i such that x has a global
definition in node i , select a complete path which includes a def-clear path from node i to node j having a global c-use of x or edge (j ,k) having
a p-use of x[1]
Trang 18 All-c-uses: For each variable x and for each node i , such that x has a
global definition in node i , select complete paths which include def-clear paths from node i to all nodes j such that there is a global c-use of x in j [1]
All-p-uses: For each variable x and for each node i such that x has a
global definition in node i , select complete paths which include def-clear paths from node i to all edges (j ,k) such that there is a p-use of x on edge (j ,k) [1]
All-p-uses/Some-c-uses: This criterion is identical to the all-p-uses
criterion except when a variable x has no p-use If x has no p-use, then this criterion reduces to the some-c-uses criterion explained below [1]
o Some-c-uses: For each variable x and for each node i such that x has a global definition in node i , select complete paths which include def-clear paths from node i to some nodes j such that there
is a global c-use of x in node j
All-c-uses/Some-p-uses: This criterion is identical to the all-c-uses
criterion except when a variable x has no global c-use If x has no global c-use, then this criterion reduces to the some-p-uses criterion explained
below [1]
o For each variable x and for each node i such that x has a global definition in node i , select complete paths which include def-clear paths from node i to some edges (j ,k) such that there is a p-use of x
on edge (j ,k)
All-uses: This criterion is the conjunction of the all-p-uses criterion and
the all-c-uses criterion discussed above[1]
All-du-paths: For each variable x and for each node i such that x has a
global definition in node i , select complete paths which include all paths from node i to all nodes j such that there is a global c-use of x in j and to all edges (j ,k) such that there is a p-use of x on (j ,k) [1]
Trang 19du-2.4 Summary
Flow of data in a program can be visualized by considering the fact that a program unit accepts input data, transforms the input data through a sequence of computations, and, finally, produces the output data Therefore, one can imagine data values to be flowing from one assignment statement defining a variable to another assignment statement or a predicate where the value is used
The program path is a fundamental concept in testing One test case can begenerated from one executable path The number of different paths selected forexecution is a measure of the extent of testing performed Path selection based onstatement coverage and branch coverage lead to a small number of paths beingchosen for execution Therefore, there exists a large gap between control flowtesting and exhaustive testing The concept of data flow testing gives
us a way tobridge the gap between control flow testing and exhaustive testing
The concept of data flow testing gives us new selection criteria forchoosing more program paths to test than what we can choose by using theidea of control flow testing Specifically, the data flow test selection criteria areall-du-paths, all-defs, all-c-uses, all-p-uses, all-uses, all-c-uses/some-p-uses, andall-p-uses/some-c-uses To compare two selection criteria, the concept of a strictlyincludes relationship is found to be useful
Chapter 3: A Method for Data Flow Testing
Trang 20This chapter presents a method for data flow testing of Java programs Given source code of a Java program, this method analyzes and visualizes the program as a data flow graph Next, it finds all paths in the generated data flow graph so that each path covers all definition-use pairs of all variables in the program Finally, all test cases corresponding to the generated test paths are created by giving values to the input parameters The expected outputs of these test cases are also computed automatically
Let U be a program, V = {v1, v2, ,vn} be a set of variables of U and P be a
set of du-path paths
3.1 Data Flow Graph
Data flow graph(DFG) is a directed graph G = {N,E}, where N is a finite set of nodes and each node represents a c-use or def,E is a finite set of directed edges and each edge represents a p-use,n 0, n f ∈N are entry node and exit node respectively Let P cpt be a set of complete path of G
First, U can be uniquely decomposed into a set of basic blocks, where a basic block is a apart of code that executes without branching Each a basic
block is corresponding to a node in the graph G Directed edges of Gwhere they
connect the nodes together are to follow the rules as Fi.g.3.1 and Fi.g.3.2
Fig.3.1 Basic components of control flow
Trang 21Fig.3.2 Structure of control statements
Definition 1 (Def) A definition of a variable v ∈ V at node n ∈ N, denoted Def(v, n), where Def(v,n) true if variable v is defined at node n and Def(v,n) = false otherwise
Definition 2.(C-use) A computation of a variable v ∈ V at node n ∈ N, denoted C-use(v,n), where C-use(v,n)= true if variable v is used to compute at node n and C-use(v,n)=false otherwise
Definition 3.(P-use) A predicate of variable v ∈ V at edge e ∈ E, denoted USE(v,e), where P-use(v,e)=true if the variable v is used at edge e and P- use(v,e)=false otherwise
P-Definition 4.(Def-c-path) A path p(n 1 , n 2 , ,n m ) of variable v ∈ V, if
def-clear-path of variable v and Def-c-def-clear-path (p)=false otherwise
Definition 5.(Pc) For each v ∈ V , ∀m; n ∈ N, if def(v,m)=true and
is m and last node is n, then if Def-c-path(v,p)=true, p is a dupath
Trang 22Definition 6.(Pp) For each v ∈ V , ∀m ∈ N and ∀e ∈ E if def(v,m) = true and P-use(v,e)=true, existing a set of paths, denoted Pp, where ∀p ∈ Pp has first node is m and last edge is e, then if Def-c-path(v,p)=true, p is a dupath
For example, source code of a program and its data flow graph are shown in Fig 3.3 By the definition 2, def(x,0) = true and def(x,3)=false Similarly, by the definition 3, C-use(x, 0)=false and C-use(x,3)=true By the definition 4, P-use(x,{0,2})=false and P-use(x,{2,3})=true
Fig.3.3 A source code and its data flow graph
The table 3.1 shows all the definitions and c-usesappearing in the data flow graph of Figure 3.3; Def(i) denotes the setof variables which have definitions in node i Similarly, C-use(i) denotes the set of variables which have c-uses in node i.The table 3.2 showsall the predicates and p-uses appearing in the data flow graph of Figure 3.3