Luận văn a method and tool support for automated data flow testing of java programs

Given source code of a Java program, the proposed method analyzes and visualizes the program as a data flow graph.. Data flow testing has been known as a key white box testing technique

Trang 1

VIETNAMNATIONALUNIVERSITY, HANOI TINIVERSITY OF ENGINEERING AND TECHNOLOGY

PHAM VAN CUONG

A METHOD AND TOOL SUPPORT FOR AUTOMATED DATA FLOW TESTING OF JAVA

PROGRAMS

Major: Computer Science

Supervisor: Dr Pham Ngoc Thung

Tlanoi 2014

Trang 2

ACKNOWLEDGEMENT

Furst of all, I would like to express my sincere gratitude lo my supervisor

Dr Pham Ngoc Hung, University of Engimeering and ‘echnology, Viet Nam

National University (VNU), Ha Noi for his enthusiastic guidance, warm

encouragement and helpful research experiences

T am gratefiul to thank all the teachers in University of Engineering and Technology VNU who provide invaluable knowledge and life skills for me during the four academic years

I would like to also thank my fiends in K18-CS class who helped me during the four academic years

Last, but not least, my family is really the biggest motivation for me My parents

always encourage me when I have stress and difficulty 1 would like to send

them great love and gratefulness

Trang 3

AUTHORSHIP

Thereby declare that this submission is my own work and to the best of my

knowledgeit contains no materials previously published or written by another person, or substanlial proportions of material which have been accepted [or the award of any other degreeor diploma at University of Engineering and

Technology (VET/Coltech) or any othereducational institution, except where

due acknowledgement is made in the thesis Anycontribution made to the

research by others, with whom | have worked at UE1/Coltechor elsewhere, is explicitly acknowledged in the thesis ] also declare that the intellectualcontent

of this thesis is the product of my own work, except to the extent thal assistance

from others in the project's design and conception or in style, presentation and

linguislicexpression is acknowledged

Signature

Trang 4

1.4 The goal of research

1.5 The organization of this thesis

2.3.1 Overview of dynamic data flow testing

2.3.2 Data flow graph

2.3.3 Data Flow Terms

Chapter 3: A Method for Data Flow Testing seq6xusbt Đồ

3.2 Test path generation

Trang 5

LIST FIGURES AND TABLES

Fig.1.1 Limitation of different fault detection techniques[10

Fig.2.1 State transition diagram of a program variable[2]

Fig.3.1 Basic components of control flow

Fig.3.2 Structure of control statements

Fig.3.3 A source code and its data flow graph

Fig.3.4 Relative Strength of Testing Strategies

Fig.3.5 Process of generating test input data for data flow testing

Fig.3.6 Model of test oracle use an alternative program —

Fig 3.7 Methods written by two different ways return the same results se 29)

Fig.4.1 Open a file java i ian

Fig.4.2 Display data flow ofa a tần program

Fig 4.3 Generating test data ait

Fig.4.4 The test results of a Java program

Fig.4.5 The source code of the programmer Ị ji

EFig.4.6 The source code of the programmer 2

Fig.4.7 Data flow of the application to calculate bill

Table 3.1 Def and c-use Sets of Nodes in Figure 3.3 compe

Table 3.2 Predicates and P-use Set of Edges in Figure: 3 3

Table 3.3 A test case of U is created from the path p

Table 3.4 The results of the two methods

Table 4.1 Billing rules

Table 4.2 The results of testing bill appligatian

Trang 6

Abstract

Thịs thesis proposes a method and a tool support for automated data flow

testing of Java programs The key purpose of this method is to detect improper

uses of data values due to coding errors Given source code of a Java program,

the proposed method analyzes and visualizes the program as a data flow graph All test paths for covering all definition-use pairs of all variables are then

generated A test case corresponding to each generated test path is produced by

identifying values to the input parameters so that the test path is executable The expected outputs of these test cases are identified automatically An implemented tool supporting the improved method and experimental results are also presented This tool is promising to be applied in practice

Key words: software testing, data flow testing, white-box testing, dataflow anomaly, data flow coverage

Trang 7

Chapter 1: Introduction

1.1 Introduce to data flow testing

The main goals of software testing are to reveal bugs and to ensure that the

system being developed complies with the customer’s requirements To make testing effective, it is recommended that test planning/development begin at the

onset of the project Software testing techniques can be divided into 2 kinds: black box and white box techniques Black box testing is mainly a validation

technique that checks to see if the product meets the customer’s requirements

On the other hand, white box testing is a verification technique which uses the

source code to guide the selection of test data Furthermore, software testing has been considered as the major solution in improving quality of software systems Currently, software companies focus only on the black box testing techniques in order to validate whether software products meet the customer’s requirements

By this approach, they only detect the errors/mistakes which can be observed by users, As a result, all potential errors of program code can be not detected Moreover, detecting such errors has been recognized as a key difficult and expensive task in practice In addition, the testers in charge this task are required high level knowledge and skills for analyzing source code These issues are still open problems in software companies, especially in Vietnam

Data flow testing has been known as a key white box testing technique that can be used to detect improper uses of data values due to coding errors [4] These errors are inadvertently introduced in a program by programmers Forinstance, a software programmer might use a variable without defining it, or he/she may define a variable, but not initialize it and then uses that variable in a predicate (e.g int x; if(x==100);) [4] The problem of errors in variables is

common problems of programmers

Each variable is classified as either a definition occurrence or a use occurrence A definition occurrence of a variable is where a value

Trang 8

isassociated with the variable A use occurrence of a variable is where the

value of the variable is referred Each use occurrence is further classified as a computational use (c-use) or a predicate use (p-use) If the value of the variable

is used to decide whether a predicate is true for selecting execution paths, the occurrence is called a predicate use Otherwise, the occurrence is

called a computational use Their criteria require that test data to be included which cause the traversal of sub-paths from a variable definition to either

some or all of the p-uses, c-uses, or their combination

We may note that there is much similarity between control flow testing and

data flow testing Moreover, there is a key difference between the twoapproaches The similarities stem from the fact that both approaches identify program paths and emphasize on generating test cases from those program paths Thedifference between the two lies in the fact that control flow test selection criteriaare used in the former, whereas data flow test selection criteria

are used in thelatter approach

1.2 Applications of data flow testing

The primary purpose of dynamic data-flow testing is to uncover possible

bugs in data usage during the execution of the code To achieve this, test cases

are created which trace every definition to each of its use and every use is traced

to each of its definition

Ntafos [10] has reported on the results of an experiment comparing with the effectiveness of three test selection techniques The data flow testing, control flow testing, and random testing detected 90%, 85.5%, and 79.5% respectively,

of the known defects Furthermore, Fig.1.1 shows the limitation of different fault

detection techniques [10] These facts imply that data flow testing is one of the

most effective methods for examining structure of programs.

Trang 9

Total number of faults in a program

Random = Control flow- Dataflow- New testing

testing based testing basedtesting —_ techniques

Fig.1.1 Limitation of different fault detection techniques[10]

1,3 Related work

Much of the formalization of define/use testing was done in the early 1980s [1,2]; the definitions in this chapter are compatible with those in [1,2], an article

which summarizes most of define/use testing theory

Firstly, with regards to test data, Tonella [21] performed the unit testing of classes using genetic algorithm In this approach test cases are generated for unit

testing of classes using algorithm McMinn and Holcombe [20] proposed a solution for the state problem in evolutionary testing using ant colony

model,Cheon et al [18] proposed automation of Java program testing at unit level using evolutionary approaches However, the types of data of these methods are limited

Secondly, with regards to generate expected output is also known as test

oracle, the oracles so far exactly compared the expected outputs with the actual However, this is not always feasible Statistical Methods [13] and Artificial neural networks (ANNs)[14] have been used to identify the expected output

6

Trang 10

These methods however, are not always feasible and require the implementation

under test (UT)

Applying metamorphic testing to situations in which there is no test oracle

has previously been studied by Chen et al [22] In some cases, these works have

looked at situations in which there cannot be an oracle for a particular application; in others, the work has considered the case in which the oracle is simply absent or difficult to implement

Although some tools supporting data flow testing such as BPAS - ATCGS (Basic Program Analyzer System Automatic Test Case Generation System) [8],

JaBUTi [9], DFC (Data Flow Coverage) [3],etc, these tools only generate all paths for covering given source code In fact, we need a tool that assists the tester in creating test data [5] that include expected output Some free versions only allow testing the programs that are fixed in these tools and they are difficult

to be extended in order to satisfy the specific data flow testing purposes of a certain software company

1.4 The goal of research

One of the major difficulties in software testing is the automatic generation

of test data and expected outputs In this thesis, wewill present a method to create test data and expected outputs bases on data flow testing of Java programs Given source code of a program, this method analyzes and visualizes the program as a data flow graph All test paths corresponding to all paths of the data flow graph for covering all definition-use pairs of all variables in the

program are then generated All test cases of generated test paths are produced

by giving values to the input parameters The set of the values to the input

parameters and expected outputs of the produced test cases are also generated

automatically In order to show the practical usefulness of the proposed method,

a tool supports the method is implemented The obtained experimental results by

applying this tool for some typical programs are completely reliable in detecting

Trang 11

all errors about using data variables In addition, this tool is a free version, open

source, and promising to be applied in practice

1.5 The organization of this thesis

The thesis is organized as follows.We first review some background in

Sect 2 Chapter 3 describes a method for data flow testing of Java programs Chapter 4 shows the implemented tool and experimental results Finally, we conclude the thesis in Sect 5

Trang 12

Chapter 2: Theory of Data Flow Testing

2.1 Basic idea

A program unit, such as a function, accepts input values, performs

computations while assigning new values to local and global variables, and,

finally, produces output values Therefore, one can imagine a kind of “flow” of data values between variables along a path of program execution A data value computed in a certain step of program execution is expected to be used in a later step For example, a program may open a file, thereby obtaining a value for a file

pointer; in a later step, the file pointer is expected to be used Intuitively, if the later use of the file pointer is never verified, then we do not know whether or not

the earlier assignment of value to the file pointer variable is all right Sometimes,

a variable may be defined twice without a use of the variable in between One

may wonder why the first definition of the variable is never used

There are two motivations for data flow testing as follows First, a

memory location corresponding to a program variable is accessed in a desirable

way For example, a memory location may not be read before writing into the location Second, it is desirable to verify the correctness of a data value generated for a variable This is performed by observing that all the uses of the value produce the desired results

Data flow testing can be performed at two conceptual levels: static data flow testing and dynamic data flow testing As the name suggests, static data flow testing is performed by analyzing the source code, and it does not involve actual execution of source code On the other hand, dynamic data flow testing

involves identifying program paths from source code based on a class of data

flow testing criteria

In this chapter, first we study the concept of data flow anomaly as identified by Fosdick and Osterweil [17] Next, we discuss dynamic data flow

testing in detail.

Trang 13

2.2 Static data flow testing

Static data flow testing is known as the data flow anomaly An anomaly is

a deviant or abnormal way of doing something For example, it is an abnormal

situation to successively assign two values to a variable without using the first

value Similarly, it is abnormal to use a value of a variable before assigning a

value to the variable Another abnormal situation is to generate a data value and

never use it,

The three abnormal situations are called type 1, type 2,and type 3 anomalies [1] These anomalies could be manifestations of potential

A: Abnormal

Fig.2.1 State transition diagram of a program variable[2]

Eg x=f(y);

x=f(2);

10

Trang 14

® Typc2: Undefined but referenced

Rg int x-0, y-0;

inl w:

x —x—y —w, /* w has not been defined by the programmer */

*® Type 3: Defined but not referenced For example, consider x = f(x, y) If

x is not used subsequently, we have a Type 3 anomaly

Iluang [16] introduced the idea of “states” of program variables to identify

data flow anomaly Now it is uscful to make an association between the type 1,

type 2,and type 3 anomalies and the state transition diagram shown in Fig.2.1

The type 1, type 2,and lype 3 anomalies are denoicd by the action sequences ded,

ur ,and du, respectively, in Kig.2.1

Data flow anomaly can be detected by using the idea of program

instrumentation Intuitively, program instrumentation means incorporating additional code in a program to monitor its execution status or example, we

can write additional cade in a program to monitor the sequence of states, namely

the U, D, R, and A, traversed by a variable If the state sequence contains the dd,

ur and du subscquence, then a dala flow anomaly is said lo have occurred

Why Static Data-flow testing is net enough?

Static [ata-flow testing will fail in situations where the state of a data

variable cannot be determined by just analyzing the code This is possible when

the data variable is used as an index for a collection of data elements Kor

example, in case of arrays, the index might be generated dynamically during execution hence we can’t guarantee what the state of the array element is which

is referenced by that index Moreover, the static data-flow testing might denote a certain piece of code to be anomalous which is never executed and hence not

completely anomalous

1

Trang 15

2.3 Dynamic data flow testing

2.3.1 Overview of dynamic data flow testing

In the process of writing code, a programmer manipulates variables in

order to achieve the desired computational effect Variable manipulation occurs

in several ways, such as initialization of the variable, assignment of a new value

to the variable, computing a value of another variable using the value of the variable, and controlling the flow of program execution

Rapps and Weyuker [1] convincingly tell us that one should not feel confident that a variable has been assigned the correct value if no test case causes theexecution of a path from the assignment to a point where the value of the variableis used In the above motivation for data flow testing, (i) assignment

of a correctvalue means whether or not a value for the variable has been

correctly generatedand (ii) use of a variable refers to further generation of values for the same or othervariables and/or control of flow A variable can be used in a predicate, that is, acondition, to choose an appropriate flow of control

The above idea gives us an indication of the involvement of certain kinds of

program paths in data flow testing Data flow testing involves selecting entry—

exit paths with the objective of covering certain data definition and use patterns, commonly known as data flow testing criteria Specifically, certain program paths are selected on the basis of data flow testing criteria

¢ Identify paths in the data flow graph satisfying the selection criteria,

expressions to derive test input

¢ Based on these value inputs, we identify the expected outputs

12

Trang 16

3:32 Data flow graph

A data flow graph is drawn with the objective of identifying data definitions

and their uses as motivated in the preceding chapter Each occurrence of a data variable is classified as follows

Definition: A statement storing a value in a memory location of a variable creates a definition (def) of the variable[1]

Use: A statement drawing a value from the memory location of a variable

is a use of the currently active definition of the variable In particular, when the variable appears on the right-hand side of an assignment

statement it is called a computational use (c-use), when the variable

appears in the predicate of the conditional statement it is called a predicate

A set of p-uses is associated with each edge of the graph

The entry node has a definition of each parameter and each nonlocal variable which occurs in the subprogram

The exit node has an undefinition of each local variable

Data Flow Terms

Avariable defined in a statement is used in another statement which may occur immediately or several statements after the definition We are interested in

finding paths that include pairs of definition and use of variables In this chapter,

we explain a family of path selection criteria that allow us to select paths with

varying strength Note that for every feasible path we can generate a test case In

the following, first we explain a few terms, and then we explain a few selection

criteria using those terms

13

Trang 17

if x has been defined before in a node other than node i [1]

® Definition Clear Path:Apath(in,.ny, j ),m >0, is called a definition clear

path (def-clear path) with respect to variable x from node 1 to node j and from node i to edge (nm, j ) [1]

* Global Definition: A node i has a global definition of a variable x if node

i has a definition of x and there is a def-clear path with respect to x from node i to some node containing a global c-use or edge containing a p-use

® Du-path: Apath(n, nạ, n, nụ ) is a definition-use path (du-path) with

respect to (w.r.t) variable x if node nl has a global definition of x and

eithernode ny, has a global c-use of x and (m, m, , nj, My ) is a def-clear

simple path w.r.t x or edge (nj ,nk )hasap-useof x and (mị, nạ, nj ) is a def-clear loop-free path w.r.t x[1]

2.3.3 Data Flow Testing Criteria

In this chapter, we explain seven types of data flow testing criteria These criteria are based on two fundamental concepts, namely, definitions and uses both c-uses and p-uses of variables

® All-defs: For each variable x and for each node i such that x has a global

definition in node i , select a complete path which includes a def-clear path from node i to node j having a global c-use of x or edge (j k) having

ap-use of x[1]

14

Trang 18

All-c-uses: For each variable x and for each node i , such that x has a

global definition in node i , select complete paths which include def-clear

paths from nods i to all nodes j such that there is a global o-use of x mj E1

All-p-uses: For each variahle x and for each node i such that x has a

global definition in node i , select complete paths which include def-clear

paths from node i to all edges (j k) such that there is a p-use of x on edge

09H]

All-p-uses/Seme-c-uses: ‘This criterion is identical to the all-p-uses

criterion except when a variable x has no p-use If x has no p-use, then this criterion reduces Lo the some-c-uses criterion explained below [1]

has a global definition in node i , select complete paths which

include def-clear paths from node i to some nodes j such that there

1s a global c-use of x in node j

All-c-uses/Some-p-uses: ‘This criterion is identical to the all-c-uses criterion except when a variable x has no global c-use If x has no global

c-use, then this criterion reduces to the some-p-uses criterion explained

below [1]

o For each variable x and for each node i such that x has a global

definition in nodc 1, sclect complete paths which include def-clear

paths from node i to some edges (j ,k) such that there is a p-use of x

on edge (j k)

AlLuses: This criterion is the conjunction of the all-p-uses criterion and

the all-c-uses criterion discussed abave|1 |

All-du-paths: For cach vanable x and for cach node i such that x has a

global definition in node i , select complete paths which include all du-

paths from node i to all nodes j such that there is a global c-use of x in j and to all edges (jk) such that there is a p-use of x on (jk) [1]

15

Trang 19

2.4 Summary

Flow of data in a program can be visualized by considering the fact that a

program unit accepts input data, transforms the input data through a sequence of computations, and, finally, produces the output data Therefore, one can imagine data values to be flowing from one assignment statement defining a variable to

another assignment statement or a predicate where the value is used

The program path is a fundamental concept in testing One test case can begenerated from one executable path The number of different paths selected forexecution is a measure of the extent of testing performed Path selection based onstatement coverage and branch coverage lead to a small number of paths beingchosen for execution Therefore, there exists a large gap between control flowtesting and exhaustive testing The concept of data flow testing gives

us a way tobridge the gap between control flow testing and exhaustive testing

The concept of data flow testing gives us new selection criteria forchoosing more program paths to test than what we can choose by using theidea of control flow testing Specifically, the data flow test selection criteria areall-du-paths, all-defs, all-c-uses, all-p-uses, all-uses, all-c-uses/some-p-uses, andall-p-uses/some-c-uses To compare two selection criteria, the concept of a strictlyincludes relationship is found to be useful

Chapter 3: A Method for Data Flow Testing

16

Trang 20

This chapter presents a method for data flow testing of Java programs

Given source code of a Java program, this method analyzes and visualizes the program as a data flow graph Next, it finds all paths in the generated data flow graph so that each path covers all definition-use pairs of all variables in the program Finally, all test cases corresponding to the generated test paths are

created by giving values to the input parameters The expected outputs of these

test cases are also computed automatically

Let U be a program, V = {vj, V2, Va} be a set of variables of U and P be a set of du-path paths

3.1 Data Flow Graph

Data flow graph(DFG) is a directed graph G = {N,E}, where N is a finite

set of nodes and each node represents a c-use or def,E is a finite set of directed edges and each edge represents a p-use,no, nEN are entry node and exit node respectively Let Py, be a set of complete path of G

First, U can be uniquely decomposed into a set of basic blocks, where a basic block is a apart of code that executes without branching Each a basic

block is corresponding to a node in the graph G Directed edges of Gwhere they connect the nodes together are to follow the rules as Fi.g.3.1 and Fi.g.3.2

e Oo 2 ý ¢

Start point process block decided point connected point end point

Fig.3.1 Basic components of control flow

17

Trang 21

sequence It switch while c do do while ¢

Fig.3.2 Structure of control statements

Definition 1 (Def) A definition of a variable v € V at node n EN, denoted

Def(v, n), where Def(v,n) true if variable v is defined at node n and Def(v,n) =

false otherwise

Definition 2.(C-use) A computation of a variable v € V at node n EN, denoted C-use(v,n), where C-use(v,n)= true if variable v is used to compute at node n

and C-use(v,n)=false otherwise

Definition 3.(P-use) A predicate of variable v € V at edge e € E, denoted P-

USE\v,e), where P-use(v,e)=true if the variable v is used at edge e and P-

use(v,e)=false otherwise

Definition 4.(Def-c-path) A path pín, n› n„) of variable v € V, if Def(n,v)=false where 1<i<m, then Def-c-path (p,v)=true or the p is a def-clear- path of variable v and Def-c-path (p)=false otherwise

Definition 5.(Pc) For each v EV, Ym; n EN, if def(v,m)=true and C- use(v,n)=true, existing a set of paths, denoted Pc, where Vp € Pc has first node

is m and last node is n, then if Def-c-path(v,p)=true, p is a dupath

18

Trang 22

Definition 6.(Pp) For eachy EV, Ym EN and ve EE if def(v,m) = true and

P-use(v,e)=true, existing a set of paths, denoted Pp, where Vp € Pp has first node is m and last edge is e, then if Def-c-path(v, p)=true, p is a dupath

For example, source code of a program and its data flow graph are shown in

Fig 3.3 By the definition 2, def(x,0) = true and def(x,3)=false Similarly, by the definition 3, C-use(x, 0)=false and C-use(x,3)=true By the definition 4, P-

use(x, {0,2})=false and P-use(x, {2,3})=true

public int getValue(int x, inty) 0) xy

Fig.3.3 A source code and its data flow graph

The table 3.1 shows all the definitions and c-usesappearing in the data flow

graph of Figure 3.3, Def(i) denotes the setof variables which have definitions in node i Similarly, C-use(i) denotes the set of variables which have c-uses in node i.The table 3.2 showsall the predicates and p-uses appearing in the data flow

graph of Figure 3.3

19

Tiêu đề	A Method And Tool Support For Automated Data Flow Testing Of Java Programs
Tác giả	Pham Van Cuong
Người hướng dẫn	Dr. Pham Ngoc Thung
Trường học	Vietnam National University, Hanoi University of Engineering and Technology
Chuyên ngành	Computer Science
Thể loại	Graduation project
Năm xuất bản	2014
Thành phố	Hanoi

Định dạng
Số trang	45
Dung lượng	2,03 MB