A Method and Tool Support for Automated Data Flow Testing of Java Programs : M.A Thesis Information Technology : 60 48 01

Given source code of a Java program, the proposed method analyzes and visualizes the program as a data ﬂow graph.. Thedifference between the two lies in the fact that control ﬂow test se

Trang 1

VIETNAMNATIONALUNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY

PHAM VAN CUONG

A METHOD AND TOOL SUPPORT FOR

AUTOMATED DATA FLOW TESTING OF JAVA

PROGRAMS

Major: Computer Science

Supervisor: Dr Pham Ngoc Hung

Hanoi 2014

Trang 2

ACKNOWLEDGEMENT

First of all, I would like to express my sincere gratitude to my supervisor

Dr Pham Ngoc Hung, University of Engineering and Technology, Viet Nam National University (VNU), Ha Noi for his enthusiastic guidance, warm encouragement and helpful research experiences

I am grateful to thank all the teachers in University of Engineering and Technology, VNU who provide invaluable knowledge and life skills for me during the four academic years

I would like to also thank my friends in K18-CS class who helped me during the four academic years

Last, but not least, my family is really the biggest motivation for me My parents always encourage me when I have stress and difficulty I would like to send them great love and gratefulness

Trang 3

I hereby declare that this submission is my own work and to the best of my knowledgeit contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degreeor diploma at University of Engineering and Technology (UET/Coltech) or any othereducational institution, except where due acknowledgement is made in the thesis Anycontribution made to the research by others, with whom I have worked at UET/Coltechor elsewhere, is explicitly acknowledged in the thesis I also declare that the intellectualcontent

of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguisticexpression is acknowledged

Signature: ………

Trang 4

Table of content

Abstract 3

Chapter 1: Introduction 4

1.1.Introduce to data flow testing 4

1.2 Applications of data flow testing 5

1.3 Related work 6

1.4 The goal of research 7

1.5 The organization of this thesis 8

Chapter 2: Theory of Data Flow Testing 9

2.1 Basic idea 9

2.2 Static data flow testing 10

2.3 Dynamic data flow testing 12

2.3.1 Overview of dynamic data flow testing 12

2.3.2 Data flow graph 13

2.3.3 Data Flow Terms 13

2.3.3 Data Flow Testing Criteria 14

2.4 Summary 16

Chapter 3: A Method for Data Flow Testing 16

3.1 Data Flow Graph 17

3.2 Test path generation 21

3.3 Test case generation 24

3.3.1 Test data 24

3.3.2 Expected output 28

Chapter 4: Experiment and Discussion 31

4.1 Experiment 31

4.2 Discussion 37

Chapter 5: Conclusion 39

References 41

Trang 5

LIST FIGURES AND TABLES

Fig.1.1 Limitation of diﬀerent fault detection techniques[10] 6

Fig.2.1 State transition diagram of a program variable[2] 10

Fig.3.1 Basic components of control flow 17

Fig.3.2 Structure of control statements 18

Fig.3.3 A source code and its data flow graph 19

Fig.3.4 Relative Strength of Testing Strategies 23

Fig.3.5 Process of generating test input data for data ﬂow testing 25

Fig.3.6 Model of test oracle use an alternative program 29

Fig 3.7 Methods written by two different ways return the same results 29

Fig.4.1 Open a file java 31

Fig.4.2 Display data flow of a Java program 32

Fig.4.3 Generating test data 32

Fig.4.4 The test results of a Java program 33

Fig.4.5 The source code of the programmer 1 34

Fig.4.6 The source code of the programmer 2 34

Fig.4.7 Data flow of the application to calculate bill 36

Table 3.1 Def and c-use Sets of Nodes in Figure 3.3 20

Table 3.2 Predicates and P-use Set of Edges in Figure 3.3 20

Table 3.3 A test case of U is created from the path p 28

Table 3.4 The results of the two methods 30

Table 4.1 Billing rules 33

Table 4.2 The results of testing bill application 35

Trang 6

Abstract

This thesis proposes a method and a tool support for automated data flow testing of Java programs The key purpose of this method is to detect improper uses of data values due to coding errors Given source code of a Java program, the proposed method analyzes and visualizes the program as a data flow graph All test paths for covering all definition-use pairs of all variables are then generated A test case corresponding to each generated test path is produced by identifying values to the input parameters so that the test path is executable The expected outputs of these test cases are identified automatically An implemented tool supporting the improved method and experimental results are also presented This tool is promising to be applied in practice

Key words: software testing, data ﬂow testing, white-box testing,

dataﬂow anomaly, data ﬂow coverage

Trang 7

Chapter 1: Introduction

1.1 Introduce to data flow testing

The main goals of software testing are to reveal bugs and to ensure that the system being developed complies with the customer‟s requirements To make testing effective, it is recommended that test planning/development begin at the onset of the project Software testing techniques can be divided into 2 kinds: black box and white box techniques Black box testing is mainly a validation technique that checks to see if the product meets the customer‟s requirements

On the other hand, white box testing is a verification technique which uses the source code to guide the selection of test data.Furthermore, software testing has been considered as the major solution in improving quality of software systems Currently, software companies focus only on the black box testing techniques in order to validate whether software products meet the customer‟s requirements

By this approach, they only detect the errors/mistakes which can be observed by users As a result, all potential errors of program code can be not detected Moreover, detecting such errors has been recognized as a key diﬃcult and expensive task in practice In addition, the testers in charge this task are required high level knowledge and skills for analyzing source code These issues are still open problems in software companies, especially in Vietnam

Data flow testing has been known as a key white box testing technique that can be used to detect improper uses of data values due to coding errors [4] These errors are inadvertently introduced in a program by programmers Forinstance, a software programmer might use a variable without defining it, or he/she may define a variable, but not initialize it and then uses that variable in a predicate (e.g int x; if(x==100);) [4] The problem of errors in variables is common problems of programmers

Each variable is classified as either a definition occurrence or a use occurrence A definition occurrence of a variable is where a value

Trang 8

isassociated with the variable A use occurrence of a variable is where the value of the variable is referred Each use occurrence is further classified as a computational use (c-use) or a predicate use (p-use) If the value of the variable

is used to decide whether a predicate is true for selecting execution paths, the occurrence is called a predicate use Otherwise, the occurrence is called a computational use Their criteria require that test data to be included which cause the traversal of sub-paths from a variable definition to either some or all of the p-uses, c-uses, or their combination

We may note that there is much similarity between control flow testing and data flow testing Moreover, there is a key difference between the twoapproaches The similarities stem from the fact that both approaches identify program paths and emphasize on generating test cases from those program paths Thedifference between the two lies in the fact that control flow test selection criteriaare used in the former, whereas data flow test selection criteria are used in thelatter approach

1.2 Applications of data flow testing

The primary purpose of dynamic data-flow testing is to uncover possible bugs in data usage during the execution of the code To achieve this, test cases are created which trace every definition to each of its use and every use is traced

to each of its definition

Ntafos [10] has reported on the results of an experiment comparing with the effectiveness of three test selection techniques The data flow testing, control flow testing, and random testing detected 90%, 85.5%, and 79.5% respectively,

of the known defects Furthermore, Fig.1.1 shows the limitation of different fault detection techniques [10] These facts imply that data flow testing is one of the most effective methods for examining structure of programs

Trang 9

Fig.1.1 Limitation of diﬀerent fault detection techniques[10]

1.3 Related work

Much of the formalization of define/use testing was done in the early 1980s [1,2]; the definitions in this chapter are compatible with those in [1,2], an article which summarizes most of define/use testing theory

Firstly, with regards to test data, Tonella [21] performed the unit testing of classes using genetic algorithm In this approach test cases are generated for unit testing of classes using algorithm McMinn and Holcombe [20] proposed a solution for the state problem in evolutionary testing using ant colony model.Cheon et al [18] proposed automation of Java program testing at unit level using evolutionary approaches However, the types of data of these methods are limited

Secondly, with regards to generate expected output is also known as test oracle, the oracles so far exactly compared the expected outputs with the actual However, this is not always feasible.Statistical Methods [13] and Artiﬁcial neural networks (ANNs)[14] have been used to identify the expected output

Trang 10

These methods however, are not always feasible and require the implementation under test (IUT)

Applying metamorphic testing to situations in which there is no test oracle has previously been studied by Chen et al [22] In some cases, these works have looked at situations in which there cannot be an oracle for a particular application; in others, the work has considered the case in which the oracle is simply absent or difﬁcult to implement

Although some tools supporting data flow testing such as BPAS - ATCGS (Basic Program Analyzer System Automatic Test Case Generation System) [8], JaBUTi [9], DFC (Data Flow Coverage) [3],etc, these tools only generate all paths for covering given source code In fact, we need a tool that assists the tester in creating test data [5] that include expected output Some free versions only allow testing the programs that are fixed in these tools and they are difficult

to be extended in order to satisfy the speciﬁc data ﬂow testing purposes of a certain software company

1.4 The goal of research

One of the major difficulties in software testing is the automatic generation

of test data and expected outputs In this thesis, wewill present a method to create test data and expected outputs bases on data flow testing of Java programs Given source code of a program, this method analyzes and visualizes the program as a data flow graph All test paths corresponding to all paths of the data flow graph for covering all definition-use pairs of all variables in the program are then generated All test cases of generated test paths are produced

by giving values to the input parameters The set of the values to the input parameters and expected outputs of the produced test cases are also generated automatically In order to show the practical usefulness of the proposed method,

a tool supports the method is implemented The obtained experimental results by applying this tool for some typical programs are completely reliable in detecting

Trang 11

all errors about using data variables In addition, this tool is a free version, open source, and promising to be applied in practice

1.5 The organization of this thesis

The thesis is organized as follows.We ﬁrst review some background in Sect 2 Chapter 3 describes a method for data ﬂow testing of Java programs Chapter 4 shows the implemented tool and experimental results Finally, we conclude the thesis in Sect 5

Trang 12

Chapter 2: Theory of Data Flow Testing

2.1 Basic idea

A program unit, such as a function, accepts input values, performs computations while assigning new values to local and global variables, and, finally, produces output values Therefore, one can imagine a kind of “flow” of data values between variables along a path of program execution A data value computed in a certain step of program execution is expected to be used in a later step For example, a program may open a file, thereby obtaining a value for a file pointer; in a later step, the file pointer is expected to be used Intuitively, if the later use of the file pointer is never verified, then we do not know whether or not the earlier assignment of value to the file pointer variable is all right Sometimes,

a variable may be defined twice without a use of the variable in between One may wonder why the first definition of the variable is never used

There are two motivations for data ﬂow testing as follows First, a memory location corresponding to a program variable is accessed in a desirable way For example, a memory location may not be read before writing into the location Second, it is desirable to verify the correctness of a data value generated for a variable This is performed by observing that all the uses of the value produce the desired results

Data flow testing can be performed at two conceptual levels: static data flow testing and dynamic data flow testing As the name suggests, static data flow testing is performed by analyzing the source code, and it does not involve actual execution of source code On the other hand, dynamic data flow testing involves identifying program paths from source code based on a class of data flow testing criteria

In this chapter, first we study the concept of data flow anomaly as identified by Fosdick and Osterweil [17] Next, we discuss dynamic data flow testing in detail

Trang 13

2.2 Static data flow testing

Static data flow testing is known as the data flow anomaly An anomaly is

a deviant or abnormal way of doing something For example, it is an abnormal situation to successively assign two values to a variable without using the ﬁrst value Similarly, it is abnormal to use a value of a variable before assigning a value to the variable Another abnormal situation is to generate a data value and never use it

The three abnormal situations are called type 1, type 2,and type 3 anomalies [1] These anomalies could be manifestations of potential programming errors

Fig.2.1 State transition diagram of a program variable[2]

 Type 1: Defined and then defined

Eg x=f(y);

x=f(z);

Trang 14

 Type 2: Undefined but referenced

Eg int x=0, y=0;

int w;

x = x – y – w; /* w has not been defined by the programmer */

 Type 3: Defined but not referenced For example, consider x = f(x, y) If

x is not used subsequently, we have a Type 3 anomaly

Huang [16] introduced the idea of “states” of program variables to identify data ﬂow anomaly Now it is useful to make an association between the type 1, type 2,and type 3 anomalies and the state transition diagram shown in Fig.2.1

The type 1, type 2,and type 3 anomalies are denoted by the action sequences dd,

ur ,and du, respectively, in Fig.2.1

Data ﬂow anomaly can be detected by using the idea of program instrumentation Intuitively, program instrumentation means incorporating additional code in a program to monitor its execution status For example, we can write additional code in a program to monitor the sequence of states, namely

the U, D, R, and A, traversed by a variable If the state sequence contains the dd,

ur ,and du subsequence, then a data ﬂow anomaly is said to have occurred

Why Static Data-flow testing is not enough?

Static Data-flow testing will fail in situations where the state of a data variable cannot be determined by just analyzing the code This is possible when the data variable is used as an index for a collection of data elements For example, in case of arrays, the index might be generated dynamically during execution hence we can‟t guarantee what the state of the array element is which

is referenced by that index Moreover, the static data-flow testing might denote a certain piece of code to be anomalous which is never executed and hence not completely anomalous

Trang 15

2.3 Dynamic data flow testing

2.3.1 Overview of dynamic data flow testing

In the process of writing code, a programmer manipulates variables in order to achieve the desired computational effect Variable manipulation occurs

in several ways, such as initialization of the variable, assignment of a new value

to the variable, computing a value of another variable using the value of the variable, and controlling the ﬂow of program execution

Rapps and Weyuker [1] convincingly tell us that one should not feel conﬁdent that a variable has been assigned the correct value if no test case causes theexecution of a path from the assignment to a point where the value of the variableis used In the above motivation for data ﬂow testing, (i) assignment

of a correctvalue means whether or not a value for the variable has been correctly generatedand (ii) use of a variable refers to further generation of values for the same or othervariables and/or control of ﬂow A variable can be used in a predicate, that is, acondition, to choose an appropriate ﬂow of control

The above idea gives us an indication of the involvement of certain kinds of program paths in data flow testing Data flow testing involves selecting entry–exit paths with the objective of covering certain data definition and use patterns, commonly known as data flow testing criteria Specifically, certain program paths are selected on the basis of data flow testing criteria

 Draw a data ﬂow graph from a program

 Select one or more data ﬂow testing criteria

 Identify paths in the data ﬂow graph satisfying the selection criteria

 Derive path predicate expressions from the selected paths and solve those expressions to derive test input

 Based on these value inputs, we identify the expected outputs

Trang 16

2.3.2 Data flow graph

A data flow graph is drawn with the objective of identifying data definitions and their uses as motivated in the preceding chapter Each occurrence of a data variable is classified as follows:

 Deﬁnition: A statement storing a value in a memory location of a variable

creates a definition (def) of the variable[1]

 Use: A statement drawing a value from the memory location of a variable

is a use of the currently active definition of the variable In particular, when the variable appears on the right-hand side of an assignment statement it is called a computational use (c-use), when the variable appears in the predicate of the conditional statement it is called a predicate use (p-use) [1]

A data ﬂow graph is a directed graph constructed as follows:

 A sequence of deﬁnitions and c-uses is associated with each node of the graph

 A set of p-uses is associated with each edge of the graph

 The entry node has a deﬁnition of each parameter and each nonlocal variable which occurs in the subprogram

 The exit node has an undeﬁnition of each local variable

2.3.3 Data Flow Terms

Avariable defined in a statement is used in another statement which may occur immediately or several statements after the definition We are interested in finding paths that include pairs of definition and use of variables In this chapter,

we explain a family of path selection criteria that allow us to select paths with varying strength Note that for every feasible path we can generate a test case In the following, ﬁrst we explain a few terms, and then we explain a few selection criteria using those terms

Trang 17

 Global c-use: A c-use of a variable x in node i is said to be a global c-use

if x has been deﬁned before in a node other than node i [1]

 Deﬁnition Clear Path:Apath(i,n1,nm, j ), m ≥0, is called a definition clear path (def-clear path) with respect to variable x from node i to node j and from node i to edge (nm, j ) [1]

 Global Deﬁnition: A node i has a global deﬁnition of a variable x if node

i has a deﬁnition of x and there is a def-clear path with respect to x from node i to some node containing a global c-use or edge containing a p-use

of variable x[1]

 Simple Path: A simple path is a path in which all nodes, except possibly

the ﬁrst and the last, are distinct[1]

 Loop-Free Path: A loop-free path is a path in which all nodes are

distinct[1]

 Complete Path: A complete path is a path from the entry node to the exit

node[1]

 Du-path: Apath(n1, n2, nj, nk ) is a deﬁnition-use path (du-path) with

respect to (w.r.t) variable x if node n1 has a global deﬁnition of x and

eithernode nk has a global c-use of x and (n1, n2, , nj, nk ) is a def-clear

simple path w.r.t x or edge (nj ,nk )hasap-useof x and (n1, n2, , nj ) is a def-clear loop-free path w.r.t x[1]

2.3.3 Data Flow Testing Criteria

In this chapter, we explain seven types of data ﬂow testing criteria These criteria are based on two fundamental concepts, namely, deﬁnitions and uses both c-uses and p-uses of variables

 All-defs: For each variable x and for each node i such that x has a global

deﬁnition in node i , select a complete path which includes a def-clear path from node i to node j having a global c-use of x or edge (j ,k) having

a p-use of x[1]

Trang 18

 All-c-uses: For each variable x and for each node i , such that x has a

global deﬁnition in node i , select complete paths which include def-clear paths from node i to all nodes j such that there is a global c-use of x in j [1]

 All-p-uses: For each variable x and for each node i such that x has a

global deﬁnition in node i , select complete paths which include def-clear paths from node i to all edges (j ,k) such that there is a p-use of x on edge (j ,k) [1]

 All-p-uses/Some-c-uses: This criterion is identical to the all-p-uses

criterion except when a variable x has no p-use If x has no p-use, then this criterion reduces to the some-c-uses criterion explained below [1]

o Some-c-uses: For each variable x and for each node i such that x has a global deﬁnition in node i , select complete paths which include def-clear paths from node i to some nodes j such that there

is a global c-use of x in node j

 All-c-uses/Some-p-uses: This criterion is identical to the all-c-uses

criterion except when a variable x has no global c-use If x has no global c-use, then this criterion reduces to the some-p-uses criterion explained

below [1]

o For each variable x and for each node i such that x has a global deﬁnition in node i , select complete paths which include def-clear paths from node i to some edges (j ,k) such that there is a p-use of x

on edge (j ,k)

 All-uses: This criterion is the conjunction of the all-p-uses criterion and

the all-c-uses criterion discussed above[1]

 All-du-paths: For each variable x and for each node i such that x has a

global deﬁnition in node i , select complete paths which include all paths from node i to all nodes j such that there is a global c-use of x in j and to all edges (j ,k) such that there is a p-use of x on (j ,k) [1]

Trang 19

du-2.4 Summary

Flow of data in a program can be visualized by considering the fact that a program unit accepts input data, transforms the input data through a sequence of computations, and, finally, produces the output data Therefore, one can imagine data values to be flowing from one assignment statement defining a variable to another assignment statement or a predicate where the value is used

The program path is a fundamental concept in testing One test case can begenerated from one executable path The number of different paths selected forexecution is a measure of the extent of testing performed Path selection based onstatement coverage and branch coverage lead to a small number of paths beingchosen for execution Therefore, there exists a large gap between control ﬂowtesting and exhaustive testing The concept of data ﬂow testing gives

us a way tobridge the gap between control ﬂow testing and exhaustive testing

The concept of data flow testing gives us new selection criteria forchoosing more program paths to test than what we can choose by using theidea of control flow testing Specifically, the data flow test selection criteria areall-du-paths, all-defs, all-c-uses, all-p-uses, all-uses, all-c-uses/some-p-uses, andall-p-uses/some-c-uses To compare two selection criteria, the concept of a strictlyincludes relationship is found to be useful

Chapter 3: A Method for Data Flow Testing

Trang 20

This chapter presents a method for data flow testing of Java programs Given source code of a Java program, this method analyzes and visualizes the program as a data flow graph Next, it finds all paths in the generated data flow graph so that each path covers all definition-use pairs of all variables in the program Finally, all test cases corresponding to the generated test paths are created by giving values to the input parameters The expected outputs of these test cases are also computed automatically

Let U be a program, V = {v1, v2, ,vn} be a set of variables of U and P be a

set of du-path paths

3.1 Data Flow Graph

Data flow graph(DFG) is a directed graph G = {N,E}, where N is a finite set of nodes and each node represents a c-use or def,E is a finite set of directed edges and each edge represents a p-use,n 0, n f ∈N are entry node and exit node respectively Let P cpt be a set of complete path of G

First, U can be uniquely decomposed into a set of basic blocks, where a basic block is a apart of code that executes without branching Each a basic

block is corresponding to a node in the graph G Directed edges of Gwhere they

connect the nodes together are to follow the rules as Fi.g.3.1 and Fi.g.3.2

Fig.3.1 Basic components of control flow

Trang 21

Fig.3.2 Structure of control statements

Definition 1 (Def) A definition of a variable v ∈ V at node n ∈ N, denoted Def(v, n), where Def(v,n) true if variable v is defined at node n and Def(v,n) = false otherwise

Definition 2.(C-use) A computation of a variable v ∈ V at node n ∈ N, denoted C-use(v,n), where C-use(v,n)= true if variable v is used to compute at node n and C-use(v,n)=false otherwise

Definition 3.(P-use) A predicate of variable v ∈ V at edge e ∈ E, denoted USE(v,e), where P-use(v,e)=true if the variable v is used at edge e and P- use(v,e)=false otherwise

P-Definition 4.(Def-c-path) A path p(n 1 , n 2 , ,n m ) of variable v ∈ V, if

def-clear-path of variable v and Def-c-def-clear-path (p)=false otherwise

Definition 5.(Pc) For each v ∈ V , ∀m; n ∈ N, if def(v,m)=true and

is m and last node is n, then if Def-c-path(v,p)=true, p is a dupath

Trang 22

Definition 6.(Pp) For each v ∈ V , ∀m ∈ N and ∀e ∈ E if def(v,m) = true and P-use(v,e)=true, existing a set of paths, denoted Pp, where ∀p ∈ Pp has first node is m and last edge is e, then if Def-c-path(v,p)=true, p is a dupath

For example, source code of a program and its data ﬂow graph are shown in Fig 3.3 By the definition 2, def(x,0) = true and def(x,3)=false Similarly, by the definition 3, C-use(x, 0)=false and C-use(x,3)=true By the definition 4, P-use(x,{0,2})=false and P-use(x,{2,3})=true

Fig.3.3 A source code and its data flow graph

The table 3.1 shows all the definitions and c-usesappearing in the data flow graph of Figure 3.3; Def(i) denotes the setof variables which have definitions in node i Similarly, C-use(i) denotes the set of variables which have c-uses in node i.The table 3.2 showsall the predicates and p-uses appearing in the data flow graph of Figure 3.3

Định dạng
Số trang	45
Dung lượng	1,62 MB