Automatic Path-oriented Test Data Generation Using a Multi-population Genetic Algorithm

Using a triangle classifier as program under test, with the guidance of branch distance based fitness function, experiment results show that MPGA can generate path- oriented test data[r]

Trang 1

Automatic Path-oriented Test Data Generation Using a Multi-population Genetic

Algorithm

Yong Chen1,2, Yong Zhong1,2

1Chengdu Institute of Computer Applications, Chinese Academy of Sciences, Chengdu 610041, China

2Graduate school, Chinese Academy of Sciences, Beijing 100080, China

czchenyong@163.com, zhongyong@cimslabsoft.com

Abstract

Automatic path-oriented test data generation is an

undecidable problem and genetic algorithm (GA) has

been used to test data generation since 1992 In favor of

MATLAB, a multi-population genetic algorithm (MPGA)

was implemented, which selects individuals for free

migration based on their fitness values Applying MPGA

to generating path-oriented test data generation is a new

and meaningful attempt After depicting how to transform

path-oriented test data generation into an optimization

problem, basic process flow of path-oriented test data

generation using GA was presented Using a triangle

classifier as program under test, experimental results

show that MPGA based approach can generate

path-oriented test data more effectively and efficiently than

simple GA based approach does

1 Introduction

Software quality becomes more important than ever

and software testing is the most significant measure for it

However, software testing is very laborious and costly

due to the fact that it is mostly made by manual [1] In

general, software testing accounts for approximately 50

percent of the elapsed time and more than 50 percent of

the total cost in software development [2] Thus,

automated software testing is a promising way to cut

down time and cost

Automatic structural test data generation is a crucial

problem in software testing automation and its

implementation can not only significantly improve the

effectiveness and efficiency but also reduce the high cost

of software testing We focus on path-oriented test data

generation in respect that various structural test data

generation problem can be transformed into a

path-oriented test data generation problem Furthermore, path

testing strategy can detect almost 65 percent of errors in

program under test [3]

Although path-oriented test data generation is an

undecidable problem [4], researchers still attempt to

develop various methods and have made some progress

These methods can be classified into two types: static

methods and dynamic methods

Static methods include symbolic execution [5] and domain reduction [6, 7] etc These methods suffer from a number of problems when it handles indefinite loops, array, procedure calls and pointer references [8]

Dynamic methods include random testing, local search approach [9], goal-oriented approach [10], chaining approach [11] and evolutionary approach [8, 12-14] Since values of input variables are determined when programs execute, dynamic test data generation can avoid those problems with that static methods are confronted

As a robust search method in complex spaces, genetic algorithm was applied to test data generation in 1992 [12] and evolutionary approach has been a burgeoning interest since then Related works [15], [8] and [16] indicate that GA-based test data generation outperforms other dynamic approaches e.g random testing and local search As far as

we know, it is a new attempt to apply a multi-population genetic algorithm to path-oriented test data generation The rest of the paper is organized as follows Section

2 depicts how to transform path-oriented test data generation into an optimization problem Section 3 gives

an introduction to a multi-population genetic algorithm Section 4 presented basic process flow of path-oriented test data generation using GA Section 5 gives experimental results based on a triangle classification program Finally, section 6 summarizes the paper with conclusions and directions for future work

2 Path-oriented test data generation as an optimization problem

To make use of genetic algorithm, a path-oriented test data generation problem requires being transformed into

an optimization problem

Firstly, program under test should be represented by its control flow graph (CFG) A CFG is a directed graph which can be denoted as G = (N, A, s, e) where N is a set

of nodes, A is a set of edges; s and e are unique entry and unique exit node respectively Each decision node is associated with a branch predicate, which is a logical expression The edges leaving decision nodes are labeled with true or false values for corresponding branch predicate To cause a path to be covered during execution,

it is necessary to find appropriate values for the input variables that satisfy related branch predicates A simple

Fourth International Conference on Natural Computation Fourth International Conference on Natural Computation Fourth International Conference on Natural Computation Fourth International Conference on Natural Computation

Trang 2

way is branch distance based approach For example, if a

branch predicate C is (a=b), when test goal is to assure the

value of C is false during program execution, then the

branch distance function f(C) = -|a-b| So, to achieve a

desired branch is transformed to search input vector that

minimize its branch distance function Table 1 gives

some common used branch distance functions that get its

minimum when C is false during program execution To

achieve a desired path P, we can define F(P) as the sum of

all related branch distance functions Consequently,

generating path-oriented test data can be transformed into

searching input vector that can minimize F(P) More

detailed description of branch distance based approach

will be given in section 5.2

Table 1 Branch distance functions

Branch

Predicate

C

Branch distance functions

a≠b f(C) = |a-b|

C1∧C2 f(C) = min(f(C1), f(C2))

C1∨C2 f(C) = f(C1)+f(C2)

In addition to branch distance based approaches, there

are other approaches using Hamming distance or

extended Hamming distance These approaches can help

us handle a path-oriented test data generation problem as

an optimization problem so that genetic algorithms can

play a vital role in it

3 A multi-population genetic algorithm

Genetic algorithms were developed by Holland [17]

with the inspiration from Darwin's evolution theory and

are widely applied to various problems Genetic

algorithms begin with a population of initial individuals

represented by chromosomes Chromosomes undergo a

process of evolution according to rules of selection,

reproduction and mutation More details of genetic

algorithms can refer to [18]

Base on the idea of parallel computing, a

multi-population genetic algorithm (MPGA) was proposed [19,

20] In most cases, MPGA can not only accelerate

convergence but improve the quality of the results

compared to SGA with single population because of

individuals’ migration among multiple subpopulations

There are two manners in which individuals are

selected for migration Uniform selection chooses

individuals for migration and replaces individuals in a

subpopulation with immigrants randomly Fitness-based selection picks individuals with high fitness level for migration and replaces individuals in a subpopulation uniformly at random

On the other hand, there are three migration strategies which stipulate migration manner of those being selected individuals for migration Adjacent migration allows individuals transferring between adjacent subpopulations unidirectional Neighborhood migration is made between nearest subpopulations in either direction Free migration means that individuals may migrate from any subpopulation to another

In favor of MATLAB7.1, a multi-population genetic algorithm (MPGA) was implemented, which selects individuals for free migration based on their fitness values For the sake of brevity, we just introduce main idea here and don’t intend to describe it in more detail

4 Basic process flow of path-oriented test data generation using genetic algorithms

To generate path-oriented test data for the program under test using GA, there are five steps

(1) Control flow graph construction (CFG) CFG of the program under test may be constructed manually or automatically with related tools It helps testers to select representative target paths

(2) Target path selection In general, a program under test has too many paths to test completely Thus, testers have to select meaningful paths as target paths

(3) Fitness function construction In order to evaluate the distance between the executed path and the target path, fitness function has to be constructed

(4) Program instrumentation This means inserting probes at the beginning of every block of source code to monitor program execution and collect related information (e.g fitness values of individuals)

(5) Test data generation and the instrumented program execution Original test data are chosen from their domain

at random and GA generates new test data in order to achieve the target path Finally, suitable test data that executes along the target paths may be generated or no suitable test data may be found because of exceeding max generation

5 Experimental studies

5.1 A triangle classifier

Triangle classifier is a widely used program in the research area of software testing [13, 21-25] It aims to determine if three input sides can form a triangle and so what type of triangle can be formed by them Figure 1 gives source code of the program using MATLAB7.1 and Figure 2 is its CFG, which consists of four paths:<d>,

Trang 3

<ae> , <abf>, <abc> They represent ‘Not-a-triangle’,

‘Scalene’, ‘Isosceles’ and ‘Equilateral’ respectively

Figure 1 An example program

Figure 2 CFG of the example program

Figure 3 The instrumented program

According to probability theory, path <abc> is the

most difficult path to be covered in path testing, so <abc>

is selected as the target path

5.2 Fitness function construction and

instrumentation

Using branch distance based approach in Section 2,

branch distance functions and fitness function of the

target path may be constructed For example, to execute the target path, the predicate in the first ‘if’ statement of the program requires get true value when program executes Otherwise, it is impossible to reach the goal Let C=(x+y>z)∧(y+z>x)∧(z+x>y)∧x>0∧y>0∧z>0

To make C true is equivalent to make ¬C false

) 0 ( ) 0 (

) 0 ( ) (

) (

) 0 ( ) 0 ( ) 0 (

) (

≤

∨

≤

∨

≤

∨

≤ +

∨

≤ +

∨

≤ +

=

>

¬

∨

>

¬

∨

>

¬

∨

>

+

¬

∨

>

+

¬

∨

>

+

¬

=

¬

z y

x y x z x z y z y x

z y

x

y x z x z y z y x C

Let C1=(x+y≤z), C2=(x+y≤z), C3=(x+y≤z), C4=(x≤0), C5=(x≤0), C6=(x≤0) To make the branch distance function of C1 and C4 to get their minimum when C1 and C4 is false respectively, we can define f(C1)=(z-x-y), f(C4)=-x For the same reason, we can define f(C2)=(x-y-z), f(C3)=(y-z-x), f(C5)=-y, f(C6)=-z Thus, f(¬C)=f(C1)+ f(C2)+ f(C3)+ f(C4)+ f(C5)+ f(C6)=-2*(x+y+z) In similar way, other predicates’ branch distance functions can be defined And the sum of those branch distance functions is used as the fitness function of the target path Those functions can be seen in Figure 3 that gives the instrumented triangle classifier

5.3 Parameters settings

Settings of SGA are as following:

(1) Coding: standard binary string (2) Length of chromosome=15bits*3=45bits and each input variable ranges from 1 to 32768

(3) Population size=1000 (4) Rank-based fitness assignment and Roulette wheel selection strategy

(5) Single-point crossover probability= 0.9 (6) Mutation probability=1/45

(7) Max generation=400 (8) Generation gap=0.9 MPGA still requires other parameters below:

(9) No of subpopulations=4 (10) No of individuals per subpopulation=250 (11) Insertion rate=0.8

(12) Migration rate=0.2 (13) Migration period=10 generations

5.4 Experimental results

5.4.1 Experiments results with identical initial population

The initial population in this section is the same in each experiment This population was generated at random and 496 individuals of it can form a scalene Other 504 individuals can not form a triangle After one hundred experiments respectively, average results in

Trang 4

Table 2 show that SGA based approach may achieve the

target path <abc> within 74441 test data by average,

while MPGA based approach only requires 21073 test

data by average However, according to probability

theory, the probability of achieving the target path is

30

2− (that is (215∗1∗1)/(215∗215∗215) where each input

variable is 15bits), which means it will take random

testing 30

2 tests to achieve the objective Moreover,

elapsed time of PPGA only counts for 1/8 of SGA

Table 2 Average results with identical initial

population

Type Generations Test data Time (seconds)

MPGA/SGA 29.33% 28.31% 12.5%

Figure 4 shows average number of test data that

covers these four paths of SGA and MPGA after one

hundred experiments respectively There are about 18.7%

individuals in MPGA can reach the path <abf> and

corresponding number of individuals in SGA only counts

for less than 1/10 of MPGA Since the path <abf> is

closest to the target path <abc>, so MPGA can reach the

target path more quickly than SGA On the other hand,

test data in ultimate populations generated by MPGA all

can form a triangle, but SGA still have 4% test data that

can not form a triangle

Figure 4 Average no of test data covering four

paths with identical initial population

5.4.2 Experiments results with random initial

populations

To investigate influence of initial populations on

MPGA and SGA, initial populations in this section are

generated randomly in each experiment Other settings are

identical with experiments in section 5.4.1 After one

hundred experiments respectively, average results of SGA

and MPGA in Table 3 are slightly higher than

corresponding results in Table 2 However, proportional

relation between the two algorithms is very close to

corresponding results in Table 2 Moreover, average number of test data covering four paths with random initial populations in Figure 5 is similar to results in Figure 4 In fact, only in the 80th experiment, ultimate population generated by MPGA has 4 test data that can not form a triangle Therefore, initial population has little influence on performance of MPGA and SGA in path-oriented test data generation

In conclusion, under the guidance of branch distance based fitness function, MPGA and SGA all can generate path-oriented test data effectively, but MPGA outperforms SGA to a considerable extent MPGA seems

a promising approach in automated software testing

Table 3 Average results with random initial

populations

Type Generations Test data Time (seconds)

MPGA/SGA 29.63% 28.73% 17.65%

Figure 5 Average no of test data covering four paths with random initial populations

6 Conclusion

It is a new attempt to apply a multi-population genetic algorithm to path-oriented test data generation Using a triangle classifier as program under test, with the guidance of branch distance based fitness function, experiment results show that MPGA can generate path-oriented test data more effectively and efficiently than SGA does Future work will include investigating target path selection strategy, comparing branch distance based fitness function with others and doing experiments on larger and more complex programs Furthermore, we intend to build an automated structural testing tool based

on MPGA

Acknowledgements This research was supported by the Sichuan Province Technical Innovation Foundation (Grant No 07PT001)

References

Trang 5

[1] B Antonia, "Software Testing Research:

Achievements, Challenges, Dreams," in 2007 Future of

Software Engineering: IEEE Computer Society, 2007

[2] G J Myers, The Art of Software Testing.2nd ed.:

John Wiley & Sons Inc, 2004

[3] B W Kernighan and P J Plauger, The Elements

of Programming Style: McGraw-Hill, Inc New York,

NY, USA, 1982

[4] E J Weyuker, "The applicability of program

schema results to programs," International Journal of

Parallel Programming, vol 8, 1979,pp 387-403

[5] C K James, "A new approach to program

testing," in Proceedings of the international conference

on Reliable software Los Angeles, California: ACM,

1975

[6] T Y Chen, T H Tse, and Z Zhiquan,

"Semi-proving: an integrated method based on global

symbolic evaluation and metamorphic testing," in

Proceedings of the 2002 ACM SIGSOFT international

symposium on Software testing and analysis Roma,

Italy: ACM, 2002

[7] S Nguyen Tran and D Yves, "Consistency

techniques for interprocedural test data generation,"

ACM SIGSOFT Software Engineering Notes, vol 28,

2003,pp 108-117

[8] G M C C Michael , M Schatz "Generating

software test data by evolution," IEEE Transactions

on Software Engineering, vol 27, 2001,pp 1085-1110

[9] B Korel, "Automated software test data

generation," IEEE Transactions on Software

Engineering, vol 16, 1990,pp 870-879

[10] B Korel, "Dynamic method for software test data

generation," Software Testing, Verification &

Reliability, vol 2, 1992,pp 203-213

[11] B Korel, "Automated test data generation for

programs with procedures," in Proceedings of the 1996

ACM SIGSOFT international symposium on Software

testing and analysis San Diego, California, United

States: ACM, 1996

[12] S Xanthakis, C Ellis, C Skourlas, A Le Gall, S

Katsikas, and K Karapoulios, "Application of genetic

algorithms to software testing (Application des

algorithmes genetiques au test des logiciels)," in

Proceedings of 5th International Conference on

Software Engineering and its Applications Toulouse,

France, 1992, pp 625-636

[13] J Wegener, A Baresel, and H Sthamer,

"Evolutionary test environment for automatic

structural testing," Information and Software

Technology, vol 43, 2001,pp 841-854

[14] J Wegener, B Kerstin, and P Hartmut,

"Automatic Test Data Generation For Structural

Testing Of Embedded Software Systems By

Evolutionary Testing," in Proceedings of the Genetic

and Evolutionary Computation Conference: Morgan

Kaufmann Publishers Inc., 2002

[15] S Levin and A Yehudai, "Evolutionary Testing:

A Case Study," in Hardware and Software, Verification and Testing, 2007, pp 155-165

[16] W Joachim, Andr, Baresel, and S Harmen,

"Suitability of Evolutionary Algorithms for

Evolutionary Testing," in Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment: IEEE Computer Society, 2002 [17] J H Holland, Adaptation in natural and artificial systems: MIT Press Cambridge, MA, USA, 1975

[18] R L Haupt and S E Haupt, Practical Genetic Algorithms: Wiley-Interscience, 2004

[19] H Muhlenbein, M Schomisch, and J Born, "The Parallel Genetic Algorithm as a Function Optimizer,"

Parallel Computing, vol 17, 1991,pp 619-632

[20] E Cantu-Paz and D E Goldberg, "Efficient parallel genetic algorithms: theory and practice,"

Computer Methods in Applied Mechanics and Engineering, vol 186, 2000,pp 221-238

[21] A L Watkins, "The automatic generation of test

data using genetic algorithms," in Proceedings of the Fourth Software Quality Conference vol 2, 1995, pp

300-309

[22] R P Pargas, M J Harrold, and R Peck,

"Test-Data Generation Using Genetic Algorithms," Software Testing, Verification and Reliability, vol 9, 1999,pp

263-282

[23] D Berndt, J Fisher, L Johnson, J Pinglikar, and

A Watkins, "Breeding Software Test Cases with

Genetic Algorithms," in Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 9 - Volume 9: IEEE

Computer Society, 2003

[24] M A Ahmed and I Hermadi, "GA-based

multiple paths test data generator," Computers and Operations Research(2007), 2007,pp

[25] J C Lin and P L Yeh, "Automatic test data

generation for path testing using GAs," Information Sciences, vol 131, 2001,pp 47-64

Định dạng
Số trang	5
Dung lượng	435,52 KB