Using a triangle classifier as program under test, with the guidance of branch distance based fitness function, experiment results show that MPGA can generate path- oriented test data[r]
Trang 1Automatic Path-oriented Test Data Generation Using a Multi-population Genetic
Algorithm
Yong Chen1,2, Yong Zhong1,2
1Chengdu Institute of Computer Applications, Chinese Academy of Sciences, Chengdu 610041, China
2Graduate school, Chinese Academy of Sciences, Beijing 100080, China
czchenyong@163.com, zhongyong@cimslabsoft.com
Abstract
Automatic path-oriented test data generation is an
undecidable problem and genetic algorithm (GA) has
been used to test data generation since 1992 In favor of
MATLAB, a multi-population genetic algorithm (MPGA)
was implemented, which selects individuals for free
migration based on their fitness values Applying MPGA
to generating path-oriented test data generation is a new
and meaningful attempt After depicting how to transform
path-oriented test data generation into an optimization
problem, basic process flow of path-oriented test data
generation using GA was presented Using a triangle
classifier as program under test, experimental results
show that MPGA based approach can generate
path-oriented test data more effectively and efficiently than
simple GA based approach does
1 Introduction
Software quality becomes more important than ever
and software testing is the most significant measure for it
However, software testing is very laborious and costly
due to the fact that it is mostly made by manual [1] In
general, software testing accounts for approximately 50
percent of the elapsed time and more than 50 percent of
the total cost in software development [2] Thus,
automated software testing is a promising way to cut
down time and cost
Automatic structural test data generation is a crucial
problem in software testing automation and its
implementation can not only significantly improve the
effectiveness and efficiency but also reduce the high cost
of software testing We focus on path-oriented test data
generation in respect that various structural test data
generation problem can be transformed into a
path-oriented test data generation problem Furthermore, path
testing strategy can detect almost 65 percent of errors in
program under test [3]
Although path-oriented test data generation is an
undecidable problem [4], researchers still attempt to
develop various methods and have made some progress
These methods can be classified into two types: static
methods and dynamic methods
Static methods include symbolic execution [5] and domain reduction [6, 7] etc These methods suffer from a number of problems when it handles indefinite loops, array, procedure calls and pointer references [8]
Dynamic methods include random testing, local search approach [9], goal-oriented approach [10], chaining approach [11] and evolutionary approach [8, 12-14] Since values of input variables are determined when programs execute, dynamic test data generation can avoid those problems with that static methods are confronted
As a robust search method in complex spaces, genetic algorithm was applied to test data generation in 1992 [12] and evolutionary approach has been a burgeoning interest since then Related works [15], [8] and [16] indicate that GA-based test data generation outperforms other dynamic approaches e.g random testing and local search As far as
we know, it is a new attempt to apply a multi-population genetic algorithm to path-oriented test data generation The rest of the paper is organized as follows Section
2 depicts how to transform path-oriented test data generation into an optimization problem Section 3 gives
an introduction to a multi-population genetic algorithm Section 4 presented basic process flow of path-oriented test data generation using GA Section 5 gives experimental results based on a triangle classification program Finally, section 6 summarizes the paper with conclusions and directions for future work
2 Path-oriented test data generation as an optimization problem
To make use of genetic algorithm, a path-oriented test data generation problem requires being transformed into
an optimization problem
Firstly, program under test should be represented by its control flow graph (CFG) A CFG is a directed graph which can be denoted as G = (N, A, s, e) where N is a set
of nodes, A is a set of edges; s and e are unique entry and unique exit node respectively Each decision node is associated with a branch predicate, which is a logical expression The edges leaving decision nodes are labeled with true or false values for corresponding branch predicate To cause a path to be covered during execution,
it is necessary to find appropriate values for the input variables that satisfy related branch predicates A simple
Fourth International Conference on Natural Computation Fourth International Conference on Natural Computation Fourth International Conference on Natural Computation Fourth International Conference on Natural Computation
Trang 2way is branch distance based approach For example, if a
branch predicate C is (a=b), when test goal is to assure the
value of C is false during program execution, then the
branch distance function f(C) = -|a-b| So, to achieve a
desired branch is transformed to search input vector that
minimize its branch distance function Table 1 gives
some common used branch distance functions that get its
minimum when C is false during program execution To
achieve a desired path P, we can define F(P) as the sum of
all related branch distance functions Consequently,
generating path-oriented test data can be transformed into
searching input vector that can minimize F(P) More
detailed description of branch distance based approach
will be given in section 5.2
Table 1 Branch distance functions
Branch
Predicate
C
Branch distance functions
a≠b f(C) = |a-b|
C1∧C2 f(C) = min(f(C1), f(C2))
C1∨C2 f(C) = f(C1)+f(C2)
In addition to branch distance based approaches, there
are other approaches using Hamming distance or
extended Hamming distance These approaches can help
us handle a path-oriented test data generation problem as
an optimization problem so that genetic algorithms can
play a vital role in it
3 A multi-population genetic algorithm
Genetic algorithms were developed by Holland [17]
with the inspiration from Darwin's evolution theory and
are widely applied to various problems Genetic
algorithms begin with a population of initial individuals
represented by chromosomes Chromosomes undergo a
process of evolution according to rules of selection,
reproduction and mutation More details of genetic
algorithms can refer to [18]
Base on the idea of parallel computing, a
multi-population genetic algorithm (MPGA) was proposed [19,
20] In most cases, MPGA can not only accelerate
convergence but improve the quality of the results
compared to SGA with single population because of
individuals’ migration among multiple subpopulations
There are two manners in which individuals are
selected for migration Uniform selection chooses
individuals for migration and replaces individuals in a
subpopulation with immigrants randomly Fitness-based selection picks individuals with high fitness level for migration and replaces individuals in a subpopulation uniformly at random
On the other hand, there are three migration strategies which stipulate migration manner of those being selected individuals for migration Adjacent migration allows individuals transferring between adjacent subpopulations unidirectional Neighborhood migration is made between nearest subpopulations in either direction Free migration means that individuals may migrate from any subpopulation to another
In favor of MATLAB7.1, a multi-population genetic algorithm (MPGA) was implemented, which selects individuals for free migration based on their fitness values For the sake of brevity, we just introduce main idea here and don’t intend to describe it in more detail
4 Basic process flow of path-oriented test data generation using genetic algorithms
To generate path-oriented test data for the program under test using GA, there are five steps
(1) Control flow graph construction (CFG) CFG of the program under test may be constructed manually or automatically with related tools It helps testers to select representative target paths
(2) Target path selection In general, a program under test has too many paths to test completely Thus, testers have to select meaningful paths as target paths
(3) Fitness function construction In order to evaluate the distance between the executed path and the target path, fitness function has to be constructed
(4) Program instrumentation This means inserting probes at the beginning of every block of source code to monitor program execution and collect related information (e.g fitness values of individuals)
(5) Test data generation and the instrumented program execution Original test data are chosen from their domain
at random and GA generates new test data in order to achieve the target path Finally, suitable test data that executes along the target paths may be generated or no suitable test data may be found because of exceeding max generation
5 Experimental studies
5.1 A triangle classifier
Triangle classifier is a widely used program in the research area of software testing [13, 21-25] It aims to determine if three input sides can form a triangle and so what type of triangle can be formed by them Figure 1 gives source code of the program using MATLAB7.1 and Figure 2 is its CFG, which consists of four paths:<d>,
Trang 3<ae> , <abf>, <abc> They represent ‘Not-a-triangle’,
‘Scalene’, ‘Isosceles’ and ‘Equilateral’ respectively
Figure 1 An example program
Figure 2 CFG of the example program
Figure 3 The instrumented program
According to probability theory, path <abc> is the
most difficult path to be covered in path testing, so <abc>
is selected as the target path
5.2 Fitness function construction and
instrumentation
Using branch distance based approach in Section 2,
branch distance functions and fitness function of the
target path may be constructed For example, to execute the target path, the predicate in the first ‘if’ statement of the program requires get true value when program executes Otherwise, it is impossible to reach the goal Let C=(x+y>z)∧(y+z>x)∧(z+x>y)∧x>0∧y>0∧z>0
To make C true is equivalent to make ¬C false
) 0 ( ) 0 (
) 0 ( ) (
) (
) (
) 0 ( ) 0 ( ) 0 (
) (
) (
) (
≤
∨
≤
∨
≤
∨
≤ +
∨
≤ +
∨
≤ +
=
>
¬
∨
>
¬
∨
>
¬
∨
>
+
¬
∨
>
+
¬
∨
>
+
¬
=
¬
z y
x y x z x z y z y x
z y
x
y x z x z y z y x C
Let C1=(x+y≤z), C2=(x+y≤z), C3=(x+y≤z), C4=(x≤0), C5=(x≤0), C6=(x≤0) To make the branch distance function of C1 and C4 to get their minimum when C1 and C4 is false respectively, we can define f(C1)=(z-x-y), f(C4)=-x For the same reason, we can define f(C2)=(x-y-z), f(C3)=(y-z-x), f(C5)=-y, f(C6)=-z Thus, f(¬C)=f(C1)+ f(C2)+ f(C3)+ f(C4)+ f(C5)+ f(C6)=-2*(x+y+z) In similar way, other predicates’ branch distance functions can be defined And the sum of those branch distance functions is used as the fitness function of the target path Those functions can be seen in Figure 3 that gives the instrumented triangle classifier
5.3 Parameters settings
Settings of SGA are as following:
(1) Coding: standard binary string (2) Length of chromosome=15bits*3=45bits and each input variable ranges from 1 to 32768
(3) Population size=1000 (4) Rank-based fitness assignment and Roulette wheel selection strategy
(5) Single-point crossover probability= 0.9 (6) Mutation probability=1/45
(7) Max generation=400 (8) Generation gap=0.9 MPGA still requires other parameters below:
(9) No of subpopulations=4 (10) No of individuals per subpopulation=250 (11) Insertion rate=0.8
(12) Migration rate=0.2 (13) Migration period=10 generations
5.4 Experimental results
5.4.1 Experiments results with identical initial population
The initial population in this section is the same in each experiment This population was generated at random and 496 individuals of it can form a scalene Other 504 individuals can not form a triangle After one hundred experiments respectively, average results in
Trang 4Table 2 show that SGA based approach may achieve the
target path <abc> within 74441 test data by average,
while MPGA based approach only requires 21073 test
data by average However, according to probability
theory, the probability of achieving the target path is
30
2− (that is (215∗1∗1)/(215∗215∗215) where each input
variable is 15bits), which means it will take random
testing 30
2 tests to achieve the objective Moreover,
elapsed time of PPGA only counts for 1/8 of SGA
Table 2 Average results with identical initial
population
Type Generations Test data Time (seconds)
MPGA/SGA 29.33% 28.31% 12.5%
Figure 4 shows average number of test data that
covers these four paths of SGA and MPGA after one
hundred experiments respectively There are about 18.7%
individuals in MPGA can reach the path <abf> and
corresponding number of individuals in SGA only counts
for less than 1/10 of MPGA Since the path <abf> is
closest to the target path <abc>, so MPGA can reach the
target path more quickly than SGA On the other hand,
test data in ultimate populations generated by MPGA all
can form a triangle, but SGA still have 4% test data that
can not form a triangle
Figure 4 Average no of test data covering four
paths with identical initial population
5.4.2 Experiments results with random initial
populations
To investigate influence of initial populations on
MPGA and SGA, initial populations in this section are
generated randomly in each experiment Other settings are
identical with experiments in section 5.4.1 After one
hundred experiments respectively, average results of SGA
and MPGA in Table 3 are slightly higher than
corresponding results in Table 2 However, proportional
relation between the two algorithms is very close to
corresponding results in Table 2 Moreover, average number of test data covering four paths with random initial populations in Figure 5 is similar to results in Figure 4 In fact, only in the 80th experiment, ultimate population generated by MPGA has 4 test data that can not form a triangle Therefore, initial population has little influence on performance of MPGA and SGA in path-oriented test data generation
In conclusion, under the guidance of branch distance based fitness function, MPGA and SGA all can generate path-oriented test data effectively, but MPGA outperforms SGA to a considerable extent MPGA seems
a promising approach in automated software testing
Table 3 Average results with random initial
populations
Type Generations Test data Time (seconds)
MPGA/SGA 29.63% 28.73% 17.65%
Figure 5 Average no of test data covering four paths with random initial populations
6 Conclusion
It is a new attempt to apply a multi-population genetic algorithm to path-oriented test data generation Using a triangle classifier as program under test, with the guidance of branch distance based fitness function, experiment results show that MPGA can generate path-oriented test data more effectively and efficiently than SGA does Future work will include investigating target path selection strategy, comparing branch distance based fitness function with others and doing experiments on larger and more complex programs Furthermore, we intend to build an automated structural testing tool based
on MPGA
Acknowledgements This research was supported by the Sichuan Province Technical Innovation Foundation (Grant No 07PT001)
References
Trang 5[1] B Antonia, "Software Testing Research:
Achievements, Challenges, Dreams," in 2007 Future of
Software Engineering: IEEE Computer Society, 2007
[2] G J Myers, The Art of Software Testing.2nd ed.:
John Wiley & Sons Inc, 2004
[3] B W Kernighan and P J Plauger, The Elements
of Programming Style: McGraw-Hill, Inc New York,
NY, USA, 1982
[4] E J Weyuker, "The applicability of program
schema results to programs," International Journal of
Parallel Programming, vol 8, 1979,pp 387-403
[5] C K James, "A new approach to program
testing," in Proceedings of the international conference
on Reliable software Los Angeles, California: ACM,
1975
[6] T Y Chen, T H Tse, and Z Zhiquan,
"Semi-proving: an integrated method based on global
symbolic evaluation and metamorphic testing," in
Proceedings of the 2002 ACM SIGSOFT international
symposium on Software testing and analysis Roma,
Italy: ACM, 2002
[7] S Nguyen Tran and D Yves, "Consistency
techniques for interprocedural test data generation,"
ACM SIGSOFT Software Engineering Notes, vol 28,
2003,pp 108-117
[8] G M C C Michael , M Schatz "Generating
software test data by evolution," IEEE Transactions
on Software Engineering, vol 27, 2001,pp 1085-1110
[9] B Korel, "Automated software test data
generation," IEEE Transactions on Software
Engineering, vol 16, 1990,pp 870-879
[10] B Korel, "Dynamic method for software test data
generation," Software Testing, Verification &
Reliability, vol 2, 1992,pp 203-213
[11] B Korel, "Automated test data generation for
programs with procedures," in Proceedings of the 1996
ACM SIGSOFT international symposium on Software
testing and analysis San Diego, California, United
States: ACM, 1996
[12] S Xanthakis, C Ellis, C Skourlas, A Le Gall, S
Katsikas, and K Karapoulios, "Application of genetic
algorithms to software testing (Application des
algorithmes genetiques au test des logiciels)," in
Proceedings of 5th International Conference on
Software Engineering and its Applications Toulouse,
France, 1992, pp 625-636
[13] J Wegener, A Baresel, and H Sthamer,
"Evolutionary test environment for automatic
structural testing," Information and Software
Technology, vol 43, 2001,pp 841-854
[14] J Wegener, B Kerstin, and P Hartmut,
"Automatic Test Data Generation For Structural
Testing Of Embedded Software Systems By
Evolutionary Testing," in Proceedings of the Genetic
and Evolutionary Computation Conference: Morgan
Kaufmann Publishers Inc., 2002
[15] S Levin and A Yehudai, "Evolutionary Testing:
A Case Study," in Hardware and Software, Verification and Testing, 2007, pp 155-165
[16] W Joachim, Andr, Baresel, and S Harmen,
"Suitability of Evolutionary Algorithms for
Evolutionary Testing," in Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment: IEEE Computer Society, 2002 [17] J H Holland, Adaptation in natural and artificial systems: MIT Press Cambridge, MA, USA, 1975
[18] R L Haupt and S E Haupt, Practical Genetic Algorithms: Wiley-Interscience, 2004
[19] H Muhlenbein, M Schomisch, and J Born, "The Parallel Genetic Algorithm as a Function Optimizer,"
Parallel Computing, vol 17, 1991,pp 619-632
[20] E Cantu-Paz and D E Goldberg, "Efficient parallel genetic algorithms: theory and practice,"
Computer Methods in Applied Mechanics and Engineering, vol 186, 2000,pp 221-238
[21] A L Watkins, "The automatic generation of test
data using genetic algorithms," in Proceedings of the Fourth Software Quality Conference vol 2, 1995, pp
300-309
[22] R P Pargas, M J Harrold, and R Peck,
"Test-Data Generation Using Genetic Algorithms," Software Testing, Verification and Reliability, vol 9, 1999,pp
263-282
[23] D Berndt, J Fisher, L Johnson, J Pinglikar, and
A Watkins, "Breeding Software Test Cases with
Genetic Algorithms," in Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 9 - Volume 9: IEEE
Computer Society, 2003
[24] M A Ahmed and I Hermadi, "GA-based
multiple paths test data generator," Computers and Operations Research(2007), 2007,pp
[25] J C Lin and P L Yeh, "Automatic test data
generation for path testing using GAs," Information Sciences, vol 131, 2001,pp 47-64