Key words: software testing, test data generation, path selection, genetic algorithm, fitness... Brief Conclusions The new approach of software testing and its test data generation is ba
Trang 1Automated Software Test Data Generation
Using
A Genetic Algorithm
Min Pei
Professor, Beijing Union University Visiting Professor, Michigan State University Visiting Professor, Beijing University of Aeronautic and Astronautic
Erik D Goodman
Professor and Director, Case Center for Computer-Aided Engineering & Manufacturing
Michigan State University Visiting Professor, Beijing University of Aeronautic and Astronautic
Zongyi Gao
Professor, Computer Science Department Beijing University of Aeronautic and Astronaut
Kaixiang Zhong
Graduate Research Assistant, Computer Science Department
Beijing University of Aeronautic and Astronautic
Abstract
Software testing is an important way for improving the quality of software It accounts approxi-mately half of all software engineering cost The critical point is to increase the degree of automa-tion of testing and its test data generaautoma-tion In 80’s most test data generaautoma-tion used symbolic
evaluation to drive test data Since the complexity of the set of predicate equations solving it can not be put into practice Recent methods use actual program execution and function minimization search method The local property of these method needs to be improved In this paper we pro-posed a new approach automated test data generation using a genetic algorithm (GA) Com-pared with traditional search algorithms the efficiency and efficacy are much better than before
We introduce the new method and apply the new method to a typical sample The results of the experiment of test data generation using a GA and its analysis is presented in this paper Further work needs to be extended to dynamic data structure
Key words:
software testing, test data generation, path selection, genetic algorithm, fitness
Trang 2path[tmp] = 0;
tmp++;
while (i < high[0])
{if (step[0] == 0) {
printf(“\n********step == 0\n”);
break;
} loop_counter++;
if (loop_counter<=Max_loop) path[tmp++] = 1;
if (max < a[i]) {
max = a[i];
if (loop_counter<=Max_loop) path[tmp++] = 2;
} else {
if (loop_counter<=Max_loop) path[tmp++] = 3;
}
if (min > a[i]) {
min = a[i];
if (loop_counter<=Max_loop) path[tmp++] = 4; } else {
if (loop_counter<=Max_loop) path[tmp++] = 5;
}
i += step[0];
if (loop_counter<=Max_loop) path[tmp++] = 6;
}
path[tmp++] = 7;
printf(“\nThe path executed is:”);
for(i=0; i<tmp; i++)
printf(“ %d “,path[i]);
i = 0;
while (path[i]==ideal_path[i])
{
fit = fit-10-scale;
scale = scale + 5;
if ((path[i] == ideal_path[i]) && (path[i] == -1))
{ printf(“\nI made it!!!\n\n\n\n”);
/*** exit(0); ***/
} i++;
}
/* printf(“the fittness is %d”, fit); */
return (fit);
}
Trang 3Appendix: The evaluation function of sample program.
#include <stdio.h>
#define PATH_LENGTH 200
#define Max_loop 2
int fat(a,low,high,step)
int a[128];
int low[1];
int high[1];
int step[1];
{
int min,max,i;
int tmp;
static int path[PATH_LENGTH];
int fit = 400;
static int ideal_path[PATH_LENGTH];
static int flag_input = 0;
int loop_counter = 0;
int path_no = 0;
int scale=0;
FILE *data_file;
/******** initialize the path array ********/
for (tmp = 0; tmp < PATH_LENGTH; tmp++) path[tmp] = -1;
/******** input the ideal path ***********/
if (!flag_input)
{ if ((data_file = fopen(“input.dat”,”r”)) == (FILE *) NULL)
{ fprintf( stderr, “Can not open the data file!!!\n”);
return -1;
}
else
{tmp = 0;
while (path_no != -1)
{fscanf(data_file,”%d”,&path_no);
ideal_path[tmp] = path_no;
tmp ++;
}
fclose (data_file);
}
flag_input = 1;
}
min = a[*low];
max = a[*low];
i = low[0] + step[0];
tmp = 0;
Trang 4[1] G Myers, The Art of Software testing New York: Wiley, 1979
[2] Leon G Stucki, “Guest Editorial: A Case for Software Testing” IEEE Trans Software Eng., vol SE-2 no 3 pp 194 1976
[3] J Bicevskis, J Borzovs, U Straujums, A Zarins, and E Miller, “SMOTL- A system con-struct samples for data processing program debugging.” IEEE Trans Software Eng., vol SE-5,
no 1 pp 60-66, Jan 1979
[4] L Clarke, “A system to generate test data and symbolically execute programs,” IEEE Tran Software Eng vol SE-2 no 3 pp 215-222 1976
[5] W Howden, “Symbolic testing and the DISSECT symbolic evaluation system,” IEEE Trans Software Eng., vol SE-4 no 4 pp 266-278 1977
[6] C Ramamoorthy, S Ho, and W Chen, “On the automated generation of program test data.” IEEE Trans Software Eng., vol SE-2 no 4 pp 293-300 1976
[7] Bogdan Korel, “Automated Software test data generation.” IEEE Trans Software Eng., vol
16 no 8 pp 870-879 August 1976
[8] P Gill and W Murray, Eds., Numerical Methods for constrained Optimization New York: Academic, 1974
[9] N Lyons, “An automatic generation system for data base simulation and testing.” Data Base, vol 8, no 4, pp 10 -13, 1977
[10] E Miller, Jr and R Melton, “Automated generation testcase of datasets.” SIGPLAN
Notices, vol 10, no 6, pp 51-58, June 1975
[11] D Bird and C Munoz, “Automatic generation of random self-checking test cases.” IBM Syst J., vol 22, no 3, pp 229-245, 1983
[12] W H0wden, “Reliability of the path analysis testing strategy.” IEEE Trans Software Eng., vol SE-2, no 3 pp 208-215, September 1976
[13] B W Kernighan and P J Plauger, The Element of Programming Style New York: Mcgraw-Hill, 1974
[14] D Goldberg, Genetic Algorithms in Search, Optimization & Machine Learning, Addison Wesley, 1989
[15] L Davis, Handbook of Genetic Algorithms, New York: Van Nostrand Reinhold, 1991 [16] N Schraudolph and J Grefenstette, “A user’s guide to GAucsd 1.4” Technical Report
CS92-249, CSE department U.C San Diego July 1992
Trang 5#include “stdio.h”
typedef struct node
{ int data;
struct node *left;
struct node *right; };
struct node *find(l, y)
struct node *l;
int y;
{
struct node *tmp, *p;
p = l;
tmp = NULL;
while(p != NULL)
{
if(y== p->data)
{ tmp = p;
p = NULL;
}else
if(y < p->data)
{
p = p->left;
}else
{
p = p->right;
};
}
return(tmp);
}
Fig 4 A sample program with dynamic program
5 Brief Conclusions
The new approach of software testing and its test data generation is based on genetic algo-rithm and program execution We take a genetic algoalgo-rithm as a sophisticated optimization tech-nique and use program actual execution on selected path as a part of evaluation function during the search process This method can work on whole path to drive the input test data which finally will cause traversal of the selected path and fulfill the goal of software testing or to find certain path infeasible The results of this experiment are encouraging Obviously, the new method is superior to most existing methods on program testing with sequential input data We plan to put this new approach and embed it in a software testing experiment system, and apply it on more practical program to check their results, then analyze and trial the real robustness of this method But how to apply this method on dynamic data structure is an open problem, additional research is required One possible way to solve this problem is to hybrid the new approach with some of the effective existing method such as dynamic data flow analysis and others We hope some new ways for a wide spectrum of test data generation will come out in the near future
Trang 6test the same path This is one of the advantage of using GA.
Table 3 Some results of program testing
4 Further work on Dynamic data structures.
Until now we have already solved a large class of program with sequential data input The fur-ther extension of the automated test data generation involves structures and pointers A structure
is a collection of one or more variables possibly of different types, grouped together under a sin-gle name for convenient handling A pointer is a variable that contains the address of another vari-able, actually it represents two variables: pointer itself and the variable pointed at The problem here is how to solve the test data generation using a GA in the presence of pointers to structures Consider the function find of Fig 4 This function accepts an integer element y and a dynamic data structure pointed to by pointer variable l The goal of test data generation is to find a program input, i.e in this case a shape of the input data structure, which will cause traversal of selected path
In this example assume we have no knowledge about the desired shape or a class of shapes of the input data structure (e.g., a binary tree, a double linked list, a directed graph etc.) Since only information available is the declaration part of the dynamic data structure, it is assumed that input data structure is a directed graph The approach to find an input data structure would be to search the space of all input data structures “shape” until the solution is found The main problem for the search procedure using GA is to look for a suitable representation or coding for the structures and
to design some recombination operation corresponding the new representation Keeping every new offspring legal is the critical point of this design, otherwise we need to create repair method
to get rid of illegal structure Obviously in the process if a large amount of illegal structures being created, it would drop down the search speed and make all procedure inefficient
The research of further work on dynamic structures is needed to focus on the new coding and new operator design Now we are planning to investigate the new development of GA field such
as the representation of Genetic Programming and others in order to solve our problem thor-oughly
No path parameter Max Min imax imin
low high step
1 0-7 89 106 61 -458 -458 89 89
2 0-1-2-5-6-7 13 123 107 407 -485 120 13
3 0-1-3-4-6-7 8 70 55 376 -40 8 63
4 0-1-3-5-6-7 112 121 7 -257 -257 119 119
5 0-1-2-5-6-1-2-5-6-7 43 119 14 398 -486 71 43
6 0-1-2-5-6-1-3-4-6-7 15 126 37 -146 -346 52 89
7 0-1-3-4-6-1-2-5-6-7 36 61 6 372 -145 48 42
8 0-1-3-5-6-1-3-4-6-7 4 122 55 323 -123 4 114
9 0-1-3-5-6-1-3-5-6-7 79 126 18 -426 -426 79 79
Trang 7Analysis of running result and practical testing
The running results is shown on the table 2 In total we have 21 paths From running results it tells us no input data can cause the path through the branches 2 and 4 simultaneously So there are
8 paths are infeasible They are No 2, 6, 7, 8, 9, 10, 14, and 18 These paths are all required through the branches 2 and 4 simultaneously, so they are all infeasible path
We can integrate the test data generation and practical software testing together When opti-mum input value comes out the evaluation procedure actually is the program dynamic execution
on required ideal path From the execution we can check the results of the program whether or not they are correct Several solutions of the sample program we list on the table 3
Table 2 The results of test data generation using a GA
During evolution period since the rule of transition from one set of input values to next are probabilistic rather than deterministic So if we use different seeds we can get different test data to
No path population
size
Trial number
Crossover rate
Mutation rate
results out of gen number
1 0-7 30 1000 0.60 0.001 0-gen(30 trials)
2 0-1-2-4-6-7 1000 50000 0.60 0.001 no cover
3 0-1-2-5-6-7 50 3000 0.60 0.001 0-gen(50 trials)
4 0-1-3-4-6-7 50 3000 0.60 0.001 0-gen(50 trials)
5 0-1-3-5-6-7 300 3000 0.60 0.001 1-gen(568 trials)
50 10000 0.60 0.001 91-gen(4043 trials)
6 0-1-2-4-6-1-2-4-6-7 1000 50000 0.60 0.001 no cover
7 0-1-2-4-6-1-2-5-6-7 1000 50000 0.60 0.001 no cover
8 0-1-2-4-6-1-3-4-6-7 1000 50000 0.60 0.001 no cover
9 0-1-2-4-6-1-3-5-6-7 1000 50000 0.60 0.001 no cover
10 0-1-2-5-6-1-2-4-6-7 1000 50000 0.60 0.001 no cover
11 0-1-2-5-6-1-2-5-6-7 50 3000 0.60 0.001 0-gen(50 trials)
12 0-1-2-5-6-1-3-4-6-7 50 500 0.60 0.001 6-gen(296trials)
13 0-1-2-5-6-1-3-5-6-7 30 500 0.60 0.001 15-gen(443trials)
14 0-1-3-4-6-1-2-4-6-7 1000 50000 0.60 0.001 no cover
15 0-1-3-4-6-1-2-5-6-7 50 3000 0.60 0.001 4-gen(227 trials)
16 0-1-3-4-6-1-3-4-6-7 50 3000 0.60 0.001 0-gen(50 trials)
17 0-1-3-4-6-1-3-5-6-7 50 3000 0.60 0.001 0-gen(50 trials)
18 0-1-3-5-6-1-2-4-6-7 1000 50000 0.60 0.001 no cover
19 0-1-3-5-6-1-2-5-6-7 100 3000 0.60 0.001 15-gen(1429 trials)
50 3000 0.60 0.001 68-gen(3042 trials)
20 0-1-3-5-6-1-3-4-6-7 500 10000 0.60 0.001 2-gen(1385 trials)
50 10000 0.60 0.001 79-gen(3512 trials)
21 0-1-3-5-6-1-3-5-6-7 1000 20000 0.40 0.01 14-gen(14986trials)
Trang 8GA running parameter and fitness function
According the size of the program or the possible number of the paths of the program user can select different population size Usually, the more the paths of the program the more we select for the population size In recent running we take population size 30, 50, 300, 500 and 1000 respec-tively, and keep crossover rate at 0.60-0.70 and mutation rate at 0.001 Preliminary experiment showed the population size is most important factor and other parameters setting make no big influence on the running results now
We can design different fitness function for evaluation of a chromosome or say the set of input values A most simple fitness function we use is in following expression:
Fitness = C - [ 10× n + 5× n(n-1)/2 ] (1)
where C is a big number n is matching number between practical subpaths and ideal required subpaths The third term is a scaling factor
Another fitness function we can use is the sum of the branch function on the path Suppose all the branch predicate are of the following form:
E1 op E2 where E1 and E2 are arithmetic expressions, and op is one of {<,≤, >,≥,=,≠} Each branch predicate can be transformed to the equivalent function of the form: :
The fitness function can be the sum of the branch functions along the ideal required path
Fitness = F1 + F2 ⋅⋅⋅ + Fn (2) This fitness function (2) may be sensitive than the simple one (1) above The GA search procedure
of the minimization of Fitness function can obtain the optimal results, i.e the test data which can execute the selected path
Equivalent Branch Function
branch predicate
branch
E1 > E2
F =
E1 - E2 E1 - E2 > 0 E1≥ E2 0 E1 - E2 < 0
E1 < E2
F =
E2 - E1 E2 - E1> 0 E1≤ E2 0 E2 - E1< 0
E1= E2
F =
abs (E1 - E2) abs (E1 - E2) > 0 E1≠ E2 0 abs (E1 - E2) < 0
Trang 9value from 0 to 1 or from 1 to 0 It prevents the loss of useful genetic material and plays a second-ary role
we are not going to introduce this operators in detail, please refer the book and related GA papers[14], [15]
The general flowchart of test data generation using a GA is shown on Fig 2 We use the GA system GAucsd 1.4 to implement the algorithm applied on the sample problem[16]
Initialize Population
(generate random sets of input values)
Evaluation Dynamic execution for ideal path Pi
G A - Operation
Crossover
Mutation Selection
Optimum
Chromosome
Path domain (input values)
for ideal path Pi
Satisfied?
No Yes
Fig 3 General Flowchart of Test data Generation using a GA
Selected
ideal path Pi
Trang 10solved step by step.
2 The search for minimum of the branch function proceed with all input variables in the man-ner of one by one All input variables x1, x2,⋅⋅⋅, xn are explored in turn until the solution is found
or fail to find the solution
3.The search procedure consists of two major phases, an “exploratory search” and a “pattern search” Actually this is a trial and error way
4.This is one-dimensional search procedure based on direct search method It allows only to find a local minimum, when function has several local minimums it can prevent to find a solution The method of automated software test data generation using a GA is totally a new approach
It can overcome most of above shortcoming The problem can be solved by evolution procedure It
works on whole path directly and search with all input variables It is a directed randomized tech-nique using evolution and genetic not a trial and error way Obviously, the new approach is a high dimensional search procedure, it can find nearly optimization We would explain the new
approach in next section briefly and use it to solve Fig 1 sample program
3 Test data generation and Software testing using a GA
Outline of test data generation using a GA
Test data problem to a genetic algorithm is represented in an abstract form in terms of a chro-mosome which is directly analogous to a chrochro-mosome in a living organism The chrochro-mosome is composed of gene, each of which may assume one of a number of possible values or alleles While in an organism a gene may represents sex or eye’s color, the gene in the test data generation representation sense is one of the input variables The genetic algorithm manipulates the coding of the set of gene values making up the chromosome at binary string level This is equivalent to oper-ate on the set of values of all input variables This is one of the basic distinction from the tradi-tional method such as direct optimization method The GA operates on a population of the sets of input values rather than a single set of input values
The overall suitability of a chromosome, that is the matching degree between the path of prac-tical execution and the ideal required path we set, is termed its fitness The value of fitness func-tion of a chromosome reflects the path of the program executing on the input values of all
variables represented by the chromosome how good it complies with the user selected path Here the coding system is quite simple, we take binary string of all values of input variables
as a chromosome When we solve the sample problem Fig 1, the chromosome is the coding of values of an array and three variables In total there are 131 input variables where 128 are the members of array and the rests are high, low and step variables We transit each array element as a
10 bits binary string and rest variables as 7 bits binary string There are three basic operations we use in GA evolution procedure They are: Reproduction, Crossover, and Mutation
• Reproduction of selection operation based on spinning a weighted roulette wheel where each current chromosome (binary string) in the population has a wheel slot sized in proportion to its fitness Spin of the weighted wheel create more offspring of high fit string in the succeeding generation
• Crossover operation take members of newly reproduction strings in the mating pool and mate together Then choice an integer position k along the string at random between 1 and the string length less one [1, l-1] and swap all bits between k+1 and l inclusively This is main opera-tor
• Mutation operation randomly walk through the string space then occasionally change the