Automated software test data generation using a genetic algorithm

Key words: software testing, test data generation, path selection, genetic algorithm, fitness... Brief Conclusions The new approach of software testing and its test data generation is ba

Trang 1

Automated Software Test Data Generation

Using

A Genetic Algorithm

Min Pei

Professor, Beijing Union University Visiting Professor, Michigan State University Visiting Professor, Beijing University of Aeronautic and Astronautic

Erik D Goodman

Professor and Director, Case Center for Computer-Aided Engineering & Manufacturing

Michigan State University Visiting Professor, Beijing University of Aeronautic and Astronautic

Zongyi Gao

Professor, Computer Science Department Beijing University of Aeronautic and Astronaut

Kaixiang Zhong

Graduate Research Assistant, Computer Science Department

Beijing University of Aeronautic and Astronautic

Abstract

Software testing is an important way for improving the quality of software It accounts approxi-mately half of all software engineering cost The critical point is to increase the degree of automa-tion of testing and its test data generaautoma-tion In 80’s most test data generaautoma-tion used symbolic

evaluation to drive test data Since the complexity of the set of predicate equations solving it can not be put into practice Recent methods use actual program execution and function minimization search method The local property of these method needs to be improved In this paper we pro-posed a new approach automated test data generation using a genetic algorithm (GA) Com-pared with traditional search algorithms the efficiency and efficacy are much better than before

We introduce the new method and apply the new method to a typical sample The results of the experiment of test data generation using a GA and its analysis is presented in this paper Further work needs to be extended to dynamic data structure

Key words:

software testing, test data generation, path selection, genetic algorithm, fitness

Trang 2

path[tmp] = 0;

tmp++;

while (i < high[0])

{if (step[0] == 0) {

printf(“\n********step == 0\n”);

break;

} loop_counter++;

if (loop_counter<=Max_loop) path[tmp++] = 1;

if (max < a[i]) {

max = a[i];

} else {

}

if (min > a[i]) {

min = a[i];

if (loop_counter<=Max_loop) path[tmp++] = 4; } else {

}

i += step[0];

}

path[tmp++] = 7;

printf(“\nThe path executed is:”);

for(i=0; i<tmp; i++)

printf(“ %d “,path[i]);

i = 0;

while (path[i]==ideal_path[i])

{

fit = fit-10-scale;

scale = scale + 5;

if ((path[i] == ideal_path[i]) && (path[i] == -1))

{ printf(“\nI made it!!!\n\n\n\n”);

/*** exit(0); ***/

} i++;

}

/* printf(“the fittness is %d”, fit); */

return (fit);

}

Trang 3

Appendix: The evaluation function of sample program.

#include <stdio.h>

#define PATH_LENGTH 200

#define Max_loop 2

int fat(a,low,high,step)

int a[128];

int low[1];

int high[1];

int step[1];

{

int min,max,i;

int tmp;

static int path[PATH_LENGTH];

int fit = 400;

static int ideal_path[PATH_LENGTH];

static int flag_input = 0;

int loop_counter = 0;

int path_no = 0;

int scale=0;

FILE *data_file;

/******** initialize the path array ********/

for (tmp = 0; tmp < PATH_LENGTH; tmp++) path[tmp] = -1;

/******** input the ideal path ***********/

if (!flag_input)

{ if ((data_file = fopen(“input.dat”,”r”)) == (FILE *) NULL)

{ fprintf( stderr, “Can not open the data file!!!\n”);

return -1;

}

else

{tmp = 0;

while (path_no != -1)

{fscanf(data_file,”%d”,&path_no);

ideal_path[tmp] = path_no;

tmp ++;

}

fclose (data_file);

}

flag_input = 1;

}

min = a[*low];

max = a[*low];

i = low[0] + step[0];

tmp = 0;

Trang 4

[1] G Myers, The Art of Software testing New York: Wiley, 1979

[2] Leon G Stucki, “Guest Editorial: A Case for Software Testing” IEEE Trans Software Eng., vol SE-2 no 3 pp 194 1976

[3] J Bicevskis, J Borzovs, U Straujums, A Zarins, and E Miller, “SMOTL- A system con-struct samples for data processing program debugging.” IEEE Trans Software Eng., vol SE-5,

no 1 pp 60-66, Jan 1979

[4] L Clarke, “A system to generate test data and symbolically execute programs,” IEEE Tran Software Eng vol SE-2 no 3 pp 215-222 1976

[5] W Howden, “Symbolic testing and the DISSECT symbolic evaluation system,” IEEE Trans Software Eng., vol SE-4 no 4 pp 266-278 1977

[6] C Ramamoorthy, S Ho, and W Chen, “On the automated generation of program test data.” IEEE Trans Software Eng., vol SE-2 no 4 pp 293-300 1976

[7] Bogdan Korel, “Automated Software test data generation.” IEEE Trans Software Eng., vol

16 no 8 pp 870-879 August 1976

[8] P Gill and W Murray, Eds., Numerical Methods for constrained Optimization New York: Academic, 1974

[9] N Lyons, “An automatic generation system for data base simulation and testing.” Data Base, vol 8, no 4, pp 10 -13, 1977

[10] E Miller, Jr and R Melton, “Automated generation testcase of datasets.” SIGPLAN

Notices, vol 10, no 6, pp 51-58, June 1975

[11] D Bird and C Munoz, “Automatic generation of random self-checking test cases.” IBM Syst J., vol 22, no 3, pp 229-245, 1983

[12] W H0wden, “Reliability of the path analysis testing strategy.” IEEE Trans Software Eng., vol SE-2, no 3 pp 208-215, September 1976

[13] B W Kernighan and P J Plauger, The Element of Programming Style New York: Mcgraw-Hill, 1974

[14] D Goldberg, Genetic Algorithms in Search, Optimization & Machine Learning, Addison Wesley, 1989

[15] L Davis, Handbook of Genetic Algorithms, New York: Van Nostrand Reinhold, 1991 [16] N Schraudolph and J Grefenstette, “A user’s guide to GAucsd 1.4” Technical Report

CS92-249, CSE department U.C San Diego July 1992

Trang 5

#include “stdio.h”

typedef struct node

{ int data;

struct node *left;

struct node *right; };

struct node *find(l, y)

struct node *l;

int y;

{

struct node *tmp, *p;

p = l;

tmp = NULL;

while(p != NULL)

{

if(y== p->data)

{ tmp = p;

p = NULL;

}else

if(y < p->data)

{

p = p->left;

}else

{

p = p->right;

};

}

return(tmp);

}

Fig 4 A sample program with dynamic program

5 Brief Conclusions

The new approach of software testing and its test data generation is based on genetic algo-rithm and program execution We take a genetic algoalgo-rithm as a sophisticated optimization tech-nique and use program actual execution on selected path as a part of evaluation function during the search process This method can work on whole path to drive the input test data which finally will cause traversal of the selected path and fulfill the goal of software testing or to find certain path infeasible The results of this experiment are encouraging Obviously, the new method is superior to most existing methods on program testing with sequential input data We plan to put this new approach and embed it in a software testing experiment system, and apply it on more practical program to check their results, then analyze and trial the real robustness of this method But how to apply this method on dynamic data structure is an open problem, additional research is required One possible way to solve this problem is to hybrid the new approach with some of the effective existing method such as dynamic data flow analysis and others We hope some new ways for a wide spectrum of test data generation will come out in the near future

Trang 6

test the same path This is one of the advantage of using GA.

Table 3 Some results of program testing

4 Further work on Dynamic data structures.

Until now we have already solved a large class of program with sequential data input The fur-ther extension of the automated test data generation involves structures and pointers A structure

is a collection of one or more variables possibly of different types, grouped together under a sin-gle name for convenient handling A pointer is a variable that contains the address of another vari-able, actually it represents two variables: pointer itself and the variable pointed at The problem here is how to solve the test data generation using a GA in the presence of pointers to structures Consider the function find of Fig 4 This function accepts an integer element y and a dynamic data structure pointed to by pointer variable l The goal of test data generation is to find a program input, i.e in this case a shape of the input data structure, which will cause traversal of selected path

In this example assume we have no knowledge about the desired shape or a class of shapes of the input data structure (e.g., a binary tree, a double linked list, a directed graph etc.) Since only information available is the declaration part of the dynamic data structure, it is assumed that input data structure is a directed graph The approach to find an input data structure would be to search the space of all input data structures “shape” until the solution is found The main problem for the search procedure using GA is to look for a suitable representation or coding for the structures and

to design some recombination operation corresponding the new representation Keeping every new offspring legal is the critical point of this design, otherwise we need to create repair method

to get rid of illegal structure Obviously in the process if a large amount of illegal structures being created, it would drop down the search speed and make all procedure inefficient

The research of further work on dynamic structures is needed to focus on the new coding and new operator design Now we are planning to investigate the new development of GA field such

as the representation of Genetic Programming and others in order to solve our problem thor-oughly

No path parameter Max Min imax imin

low high step

1 0-7 89 106 61 -458 -458 89 89

2 0-1-2-5-6-7 13 123 107 407 -485 120 13

3 0-1-3-4-6-7 8 70 55 376 -40 8 63

4 0-1-3-5-6-7 112 121 7 -257 -257 119 119

5 0-1-2-5-6-1-2-5-6-7 43 119 14 398 -486 71 43

6 0-1-2-5-6-1-3-4-6-7 15 126 37 -146 -346 52 89

7 0-1-3-4-6-1-2-5-6-7 36 61 6 372 -145 48 42

8 0-1-3-5-6-1-3-4-6-7 4 122 55 323 -123 4 114

9 0-1-3-5-6-1-3-5-6-7 79 126 18 -426 -426 79 79

Trang 7

Analysis of running result and practical testing

The running results is shown on the table 2 In total we have 21 paths From running results it tells us no input data can cause the path through the branches 2 and 4 simultaneously So there are

8 paths are infeasible They are No 2, 6, 7, 8, 9, 10, 14, and 18 These paths are all required through the branches 2 and 4 simultaneously, so they are all infeasible path

We can integrate the test data generation and practical software testing together When opti-mum input value comes out the evaluation procedure actually is the program dynamic execution

on required ideal path From the execution we can check the results of the program whether or not they are correct Several solutions of the sample program we list on the table 3

Table 2 The results of test data generation using a GA

During evolution period since the rule of transition from one set of input values to next are probabilistic rather than deterministic So if we use different seeds we can get different test data to

No path population

size

Trial number

Crossover rate

Mutation rate

results out of gen number

1 0-7 30 1000 0.60 0.001 0-gen(30 trials)

2 0-1-2-4-6-7 1000 50000 0.60 0.001 no cover

3 0-1-2-5-6-7 50 3000 0.60 0.001 0-gen(50 trials)

4 0-1-3-4-6-7 50 3000 0.60 0.001 0-gen(50 trials)

5 0-1-3-5-6-7 300 3000 0.60 0.001 1-gen(568 trials)

50 10000 0.60 0.001 91-gen(4043 trials)

6 0-1-2-4-6-1-2-4-6-7 1000 50000 0.60 0.001 no cover

7 0-1-2-4-6-1-2-5-6-7 1000 50000 0.60 0.001 no cover

8 0-1-2-4-6-1-3-4-6-7 1000 50000 0.60 0.001 no cover

9 0-1-2-4-6-1-3-5-6-7 1000 50000 0.60 0.001 no cover

10 0-1-2-5-6-1-2-4-6-7 1000 50000 0.60 0.001 no cover

11 0-1-2-5-6-1-2-5-6-7 50 3000 0.60 0.001 0-gen(50 trials)

12 0-1-2-5-6-1-3-4-6-7 50 500 0.60 0.001 6-gen(296trials)

13 0-1-2-5-6-1-3-5-6-7 30 500 0.60 0.001 15-gen(443trials)

14 0-1-3-4-6-1-2-4-6-7 1000 50000 0.60 0.001 no cover

15 0-1-3-4-6-1-2-5-6-7 50 3000 0.60 0.001 4-gen(227 trials)

16 0-1-3-4-6-1-3-4-6-7 50 3000 0.60 0.001 0-gen(50 trials)

17 0-1-3-4-6-1-3-5-6-7 50 3000 0.60 0.001 0-gen(50 trials)

18 0-1-3-5-6-1-2-4-6-7 1000 50000 0.60 0.001 no cover

19 0-1-3-5-6-1-2-5-6-7 100 3000 0.60 0.001 15-gen(1429 trials)

50 3000 0.60 0.001 68-gen(3042 trials)

20 0-1-3-5-6-1-3-4-6-7 500 10000 0.60 0.001 2-gen(1385 trials)

50 10000 0.60 0.001 79-gen(3512 trials)

21 0-1-3-5-6-1-3-5-6-7 1000 20000 0.40 0.01 14-gen(14986trials)

Trang 8

GA running parameter and fitness function

According the size of the program or the possible number of the paths of the program user can select different population size Usually, the more the paths of the program the more we select for the population size In recent running we take population size 30, 50, 300, 500 and 1000 respec-tively, and keep crossover rate at 0.60-0.70 and mutation rate at 0.001 Preliminary experiment showed the population size is most important factor and other parameters setting make no big influence on the running results now

We can design different fitness function for evaluation of a chromosome or say the set of input values A most simple fitness function we use is in following expression:

Fitness = C - [ 10× n + 5× n(n-1)/2 ] (1)

where C is a big number n is matching number between practical subpaths and ideal required subpaths The third term is a scaling factor

Another fitness function we can use is the sum of the branch function on the path Suppose all the branch predicate are of the following form:

E1 op E2 where E1 and E2 are arithmetic expressions, and op is one of {<,≤, >,≥,=,≠} Each branch predicate can be transformed to the equivalent function of the form: :

The fitness function can be the sum of the branch functions along the ideal required path

Fitness = F1 + F2 ⋅⋅⋅ + Fn (2) This fitness function (2) may be sensitive than the simple one (1) above The GA search procedure

of the minimization of Fitness function can obtain the optimal results, i.e the test data which can execute the selected path

Equivalent Branch Function

branch predicate

branch

E1 > E2

F =

E1 - E2 E1 - E2 > 0 E1≥ E2 0 E1 - E2 < 0

E1 < E2

F =

E2 - E1 E2 - E1> 0 E1≤ E2 0 E2 - E1< 0

E1= E2

F =

abs (E1 - E2) abs (E1 - E2) > 0 E1≠ E2 0 abs (E1 - E2) < 0

Trang 9

value from 0 to 1 or from 1 to 0 It prevents the loss of useful genetic material and plays a second-ary role

we are not going to introduce this operators in detail, please refer the book and related GA papers[14], [15]

The general flowchart of test data generation using a GA is shown on Fig 2 We use the GA system GAucsd 1.4 to implement the algorithm applied on the sample problem[16]

Initialize Population

(generate random sets of input values)

Evaluation Dynamic execution for ideal path Pi

G A - Operation

Crossover

Mutation Selection

Optimum

Chromosome

Path domain (input values)

for ideal path Pi

Satisfied?

No Yes

Fig 3 General Flowchart of Test data Generation using a GA

Selected

ideal path Pi

Trang 10

solved step by step.

2 The search for minimum of the branch function proceed with all input variables in the man-ner of one by one All input variables x1, x2,⋅⋅⋅, xn are explored in turn until the solution is found

or fail to find the solution

3.The search procedure consists of two major phases, an “exploratory search” and a “pattern search” Actually this is a trial and error way

4.This is one-dimensional search procedure based on direct search method It allows only to find a local minimum, when function has several local minimums it can prevent to find a solution The method of automated software test data generation using a GA is totally a new approach

It can overcome most of above shortcoming The problem can be solved by evolution procedure It

works on whole path directly and search with all input variables It is a directed randomized tech-nique using evolution and genetic not a trial and error way Obviously, the new approach is a high dimensional search procedure, it can find nearly optimization We would explain the new

approach in next section briefly and use it to solve Fig 1 sample program

3 Test data generation and Software testing using a GA

Outline of test data generation using a GA

Test data problem to a genetic algorithm is represented in an abstract form in terms of a chro-mosome which is directly analogous to a chrochro-mosome in a living organism The chrochro-mosome is composed of gene, each of which may assume one of a number of possible values or alleles While in an organism a gene may represents sex or eye’s color, the gene in the test data generation representation sense is one of the input variables The genetic algorithm manipulates the coding of the set of gene values making up the chromosome at binary string level This is equivalent to oper-ate on the set of values of all input variables This is one of the basic distinction from the tradi-tional method such as direct optimization method The GA operates on a population of the sets of input values rather than a single set of input values

The overall suitability of a chromosome, that is the matching degree between the path of prac-tical execution and the ideal required path we set, is termed its fitness The value of fitness func-tion of a chromosome reflects the path of the program executing on the input values of all

variables represented by the chromosome how good it complies with the user selected path Here the coding system is quite simple, we take binary string of all values of input variables

as a chromosome When we solve the sample problem Fig 1, the chromosome is the coding of values of an array and three variables In total there are 131 input variables where 128 are the members of array and the rests are high, low and step variables We transit each array element as a

10 bits binary string and rest variables as 7 bits binary string There are three basic operations we use in GA evolution procedure They are: Reproduction, Crossover, and Mutation

• Reproduction of selection operation based on spinning a weighted roulette wheel where each current chromosome (binary string) in the population has a wheel slot sized in proportion to its fitness Spin of the weighted wheel create more offspring of high fit string in the succeeding generation

• Crossover operation take members of newly reproduction strings in the mating pool and mate together Then choice an integer position k along the string at random between 1 and the string length less one [1, l-1] and swap all bits between k+1 and l inclusively This is main opera-tor

• Mutation operation randomly walk through the string space then occasionally change the

Định dạng
Số trang	15
Dung lượng	40,06 KB