Genetic algorithms for solving bounded diameter minimum spanning tree problem

MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY GENETIC ALGORITHMS FOR SOLVING BOUNDED DIAMETER MINIMUM SPANNING TREE PROBLEM By Huynh Thi Thanh Binh Supe

Trang 1

MINISTRY OF EDUCATION AND TRAINING

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

GENETIC ALGORITHMS FOR SOLVING BOUNDED DIAMETER MINIMUM SPANNING TREE PROBLEM

By Huynh Thi Thanh Binh Supervisor: Associate Professor Nguyen Duc Nghia

A Dissertation submitted in partial fulfillment of the requirements

for the Degree of Doctor of Philosophy in Engineering

Trang 2

Table of Contents

1.1 Motivation 7

1.2 Methodologies 8

1.3 Scope of research 14

1.4 Contributions 14

1.5 Outline 16

2 Bounded Diameter Minimum Spanning Tree and Related Works 18 2.1 Problem formulation 18

2.2 Related Optimization and Decision Problems 20

2.3 Related works 22

2.3.1 Exact approaches 23

2.3.2 Heuristic Methods 23

2.3.2.1 One Time Tree Construction Algorithm 24

2.3.2.2 Center-Based Tree Construction Algorithm 25

2.3.2.3 Randomized Greedy Heuristic Algorithm 25

Trang 3

2.3.2.4 Improved Greedy Heurisitics (RGH − I and CBT C − I) 27

2.3.2.5 Hierarchical clustering heuristic algorithm - HCH 28

2.3.2.6 Comments 29

2.3.3 Metaheuristic algorithms 30

2.3.4 Conclusion 39

3 Center-Based Recursive Clustering Heuristic Algorithm 41 3.1 The new greedy heuristic - Center-Based Recursive Clustering (CBRC) 41

3.2 The improvement of Centre-Based Recursive Clustering - CBRC − I 44

3.3 Experiments 45

3.3.1 Problem instances 45

3.3.2 Experiment setup 46

3.3.3 Result 46

3.4 Discussion 55

3.5 Conclusion 56

4 Genetic algorithm with multi-parent recombination operator 57 4.1 Individual representation and genetic operators 58

4.2 Experiments 61

4.2.3 System setting 62

4.2.4 Results and discussion 63

4.3 Conclusion 77

5 Multi-population Genetic Algorithm 79 5.1 Structure of the genetic algorithm 80

5.2 Experiments 83

Trang 4

5.2.3 System setting 84

5.2.4 Result 85

5.2.5 Discussion 85

5.3 Conclusion 95

6 Steady-state genetic algorithm 97 6.1 Steady state genetic algorithm structure 97

6.1.1 Individual representation and initial population 97

6.1.2 Crossover 98

6.1.3 Mutation 98

6.1.4 Selection 99

6.2 Replacement policy 99

6.3 Experiments 100

6.3.3 Parameter 101

6.3.4 Result 101

6.4 Conclusion 106

Trang 5

List of Figures

1.1 Scheme of genetic algorithm 112.1 The BDST with 19 vertices and bounded diameter D=4, v is the center ofthe tree 202.2 The BDST with 19 vertices and bounded diameter D=5, v1, v2 are thecenters of the tree 202.3 The best BDST found by OT T C algorithm on the Euclidean probleminstance with n = 100, D = 5 242.4 The best BDST found by CBT C algorithm on the Euclidean probleminstance with n = 100, D = 10 272.5 The best BDST found by CBT C − I algorithm on the Euclidean probleminstance with n = 100, D = 10 272.6 The best BDST found by RGH algorithm on the Euclidean problem in-stance with n = 100, D = 10 282.7 The best BDST found by RGH − I algorithm on the Euclidean probleminstance with n = 100, D = 10 282.8 A spanning tree on twelve nodes and an its edge-set representation 312.9 A spanning tree on eleven nodes and an its permutation-code representation 332.10 Center Move Mutation 342.11 Edge Delete Mutation 342.12 Subtree-Optimize Mutation 35

Trang 6

2.13 The best BDST found by J R−ESEA algorithm on the Euclidean probleminstance with n = 250, D = 15 392.14 The best BDST found by J R − P EA algorithm on the Euclidean probleminstance with n = 250, D = 15 392.15 The best BDST found by P EA − I algorithm on the Euclidean probleminstance with n = 250, D = 15 393.1 A star-like structure of a typical solution to the BDM ST problem 423.2 Greedy Edge Delete Local search 453.3 The best BDST found by CBRC heuristic on the Euclidean problem in-stance with n = 100, D = 10 453.4 The best BDST found by CBRC − I heuristic on the Euclidean probleminstance with n = 100, D = 10 454.1 Comparison between the sum of the best solutions found by EA − xy2algorithm on all the problem instances 684.2 Comparison between the sum of the best solutions found by EA − xy5algorithm on all the problem instances 684.3 Comparison between the sum of the best solutions found by EA − xy7algorithm on all the problem instances 684.4 Comparison between the sum of the best solutions found by EA − xy9algorithm on all the problem instances 684.5 Comparison between the sum of the best solutions found by EA − xdkalgorithm on all the problem instances (x = b, r, l; k = 2, 5, 7, 9) 694.6 Comparison between the sum of the best solutions found by EA − xgkalgorithm on all the problem instances (x = b, r, l; k = 2, 5, 7, 9) 694.7 Comparison between the sum of the best solutions found by EA − xmkalgorithm on all the problem instances (x = b, r, l; k = 2, 5, 7, 9) 69

Trang 7

4.8 Comparison between the sum of the average solutions found by EA − xy2algorithm on all the problem instances (x = b, r, l) 694.9 Comparison between the sum of the average solutions found by EA − xy5algorithm on all the problem instances 704.10 Comparison between the sum of the average solutions found by EA − xy7algorithm on all the problem instances 704.11 Comparison between the sum of the average solutions found by EA − xy9algorithm on all the problem instances (x = b, r, l) 704.12 Comparison between the sum of the average solutions found by EA − xdkalgorithm on all the problem instances (x = b, r, l; k = 2, 5, 7, 9) 704.13 Comparison between the sum of the average solutions found by EA − xgkalgorithm on all the problem instances (x = b, r, l; k = 2, 5, 7, 9) 714.14 Comparison between the sum of the average solutions found by EA − xmkalgorithm on all the problem instances (x = b, r, l; k = 2, 5, 7, 9) 714.15 Comparision between the best solution found by GA1, GA2, GA3, GA4,

GA5, GA6 on all the problem instance 714.16 Comparision between the standard deviation of the solution found by GA1,

GA2, GA3, GA4, GA5, GA6 on all the problem instance 715.1 Multi-population model 805.2 The comparision between the best results found by GA11, GA12, GA13,

GA14 and HGA on the instance with n = 250, D = 15, instance 1 865.3 The comparision between the mean results found by GA11, GA12, GA13,

GA14 and HGA on the instance with n = 250, D = 15, instance 1 915.4 The number of individuals from GA11, GA12, GA13, GA14 migrate to GAf inal 916.1 P EA − I algorithm 99

Trang 8

List of Tables

3.1 Diameter Bound 463.2 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theEuclidean instances of the BDM ST problem with n = 100 and D =

5, 7, 9, 11, 13, 15 473.3 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theEuclidean instances of the BDM ST problem with n = 250 and D =

5, 10, 13, 15, 17, 20, 25 483.4 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theEuclidean instances of the BDM ST problem with n = 500 and D =

10, 15, 18, 20, 22, 25, 30 493.5 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theEuclidean instances of the BDM ST problem with n = 1000 and D =

15, 20, 23, 25, 27, 30, 35 503.6 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theNon-Euclidean instances of the BDM ST problem with n = 100 and D =

5, 7, 9, 11, 13, 15 513.7 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theNon-Euclidean instances of the BDM ST problem with n = 250 and D =

5, 10, 13, 15, 17, 20, 25 52

Trang 9

3.8 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theNon-Euclidean instances of the BDM ST problem with n = 500 and D =

10, 15, 18, 20, 22, 25, 30 533.9 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theNon-Euclidean instances of the BDM ST problem with n = 1000 and D =

15, 20, 23, 25, 27, 30, 35 544.1 The rate of the heuristic algorithms use for initialization of the population

in each experiment genetic algorithm 634.2 Comparision between the result found by EA − xy2; x = d, g, m; y = l, r, b

on the 20 Euclidean problem instances 644.3 Comparision between the result found by EA − xy5; x = d, g, m; y = l, r, b

on the 20 Euclidean problem instances 675.1 Comparision between the result with different crossover probabily on theEuclidean problem instance with number of vertices are 250, D=15 855.2 Comparision between the result with different crossover probabily on theEuclidean problem instance with number of vertices are 250, D=15 865.3 Comparision between the result found by RJ − ESEA, P EA − I, HGA,

M HGA on the 20 Euclidean problem instances 875.4 Comparision between the result found by RJ − ESEA, P EA − I, HGA,

M HGA on the 20 Non-Euclidean problem instances 885.5 Result of GA11, GA12, GA13, GA14 and HGA on 20 Euclidean BDM STproblem instances 89

Trang 10

5.6 Result of GA11, GA12, GA13, GA14and HGA on 20 Non-Euclidean BDM STproblem instances 905.7 Result of GA21, GA22, GA23, GA24and M HGA on 20 Euclidean BDM STproblem instances 925.8 Result of GA21, GA22, GA23, GA24and M HGA on 20 Euclidean BDM STproblem instances 936.1 Results of P EA − RGH, P EA − RGHI, P EA − CBRC, P EA − CBRCI,

P EA − I on the 20 Euclidean instances of the BDM ST problem of size

100, 250, 500 and 1,000 1026.2 Average number of iterations required by P EA − RGH, P EA − RGHI,

P EA − CBRC, P EA − CBRCI, P EA − I to reach the best solution onthe 20 Euclidean instances of BDM ST problem of size 100, 250, 500 and1,000 1036.3 Results of P EA − RGH, P EA − RGHI, P EA − CBRC, P EA − CBRCI,

P EA − I on the 20 Non-Euclidean instances of the BDM ST problem ofsize 100, 250, 500 and 1,000 1046.4 Average number of iterations required by P EA − RGH, P EA − RGHI,

P EA − CBRC, P EA − CBRCI, P EA − I to reach the best solution onthe 20 Non-Euclidean instances of BDM ST problem of size 100, 250, 500and 1,000 105

Trang 11

The Bounded Diameter Minimum Spanning Tree (BDM ST ) problem is a torial optimization problem that arises in many applications such as design of wire-basedcommunication networks under quality of service requirements; design linear lightwavenetworks, where it can minimize interference in the network by limiting the traffic in thenetwork lines Another practical application requiring a BDM ST arises in data compres-sion, where some algorithms compress a file utilizing a tree data-structure, and decompress

combina-a pcombina-ath in the tree to combina-access combina-a record in combina-ad-hoc wireless networks distributed mutucombina-al clusion algorithms

ex-Let G = (V, E) be a connected undirected graph with positive edge weights w(e) (e is anedge of graph) The BDM ST problem can be formulated as follows: among spanningtrees of G whose diameters do not exceed a given upper bound D ≥ 2, find a spanningtree with the minimal cost (sum of the weights on edges of the tree) As in almost allstudies of the BDM ST problem, and without lost of generality, we will assume that G is

a complete graph

This problem is known to be N P − hard for 4 ≤ D < |V | − 1 Moreover, the BDM STproblem has been shown to be also approximate-hard, in that there is no polynomial timealgorithm which could guarantee to find a solution which has a cost within log(|V |) ofthe optimum, unless P = N P Therefore, heuristic and meta-heuristic techniques are cur-rently the only practical method for improving the solution quality in solving the BDM STproblem, especially when |V | is large

Trang 12

In this thesis, we survey the literature on the BDM ST and then present new algorithmsfor solving this problem.

First, we propose a greedy heuristic algorithm called Center-Based Recursive Clustering(CBRC) We extend the concept of center to each level of the partially constructedspanning tree The algorithm can be seen as recursively clustering the vertices of thegraph: every internal node of the spanning tree is the center of the sub-graph in the sub-tree rooted at this node and we recursive to find the best center The new heuristic iscompared with other well-known heuristics for solving the BDM ST problem, namely, theOne-Time-Tree-Construction (OT T C), the Randomized Greedy Heuristic (RGH) of Raidland Julstrom, the Center-Based Tree Construction (CBT C) of Julstrom, the RandomizedGreedy Heuristic with post-improvement (RGH − I) and Center-Based Tree Constructionwith post-improvement (CBT C − I) of Singh and Gupta

And then, we introduce multi-parent recombination operator in Genetic Algorithms (GAs)for solving the BDM ST problem The proposed multi-parent recombination operator al-lows using more than two parents to create offspring We consider three different methodsfor choosing parents Three new methods for adding edges from the parents to the off-spring also considered For each of the three methods of choosing parents and three waysfor adding edges, we also experiment genetic algorithms for solving BDM ST problemwith different number of parents We discuss and analyze the efficiency of using differ-ent heuristic algorithms to initialize the population in genetic algorithm for solving theBDM ST problem

We present a new genetic algorithm (GA) which use multi-population where each tion is initialized with a different well know heuristic The individuals in each populationwill subsequently compete for positions in a selection population, using a simulated an-nealing mechanism based on proportionate selection In the selection population, theywill combine and evolve toward the optimum We compare our results with other GA.Beside generational genetic algorithm, recently, many researchers are interested in steady-

Trang 13

popula-state genetic algorithm We present steady-popula-state genetic algorithms which use differentheuristic algorithms for decoding We modify the decoder and the replacement policyused in P EA − I so as to improve its performance We use four decoders by differentwell-known heuristic algorithms: RGH, RGH − I, CBRC, CBRC − I.

Experimental results are also reported to compare the efficiency of different heuristic andgenetic algorithms for solving BDM ST problem

Trang 14

Dr Truong Thi Dieu Linh, Dr Le Minh Hoang, Dr Le Trong Vinh.

I would like to give special thanks to my parents, my husband and my daughters, who gave

me unconditional support and encouragement during the long time I needed to conductresearch and write this thesis

Also, thank to Ministry of Education and Training, Hanoi University of Science and nology, National Foundation for Science and Technology Development for their fundingfor my research I would like to thank to my colleagues at School of Information andCommunication Technology, my friends, for their comments and encouragement

Trang 15

Tech-Chapter 1

Introduction

Network design problems are active topics in research The selection of an timal configuration or design of a network occurs in many different application contextsincluding transportation (airline, railroad, traffic, and mass transit), communication (tele-phone and computer networks), electric power systems, and oil and gas pipelines Thereare a lot of real world problems can be mapped to a formulation dealing with nodes andedges within a graph For example, telephone companies are particularly interested inminimum spanning tree, because the minimum spanning tree of a set of sites defines thewiring scheme that connects the sites using as little wire as possible It is the mother ofall network design problems This minimum spanning tree is a fundamental problem andcan be easy polynomial-time solved by using Prim or Kruskal algorithm

op-Another example concern with a traffic network whose nodes represent both origin anddestination areas for the vehicular traffic of a city and also intersections in the road net-work The arcs correspond to streets in the city, and the arc flows are the amount of traffictraversing the streets A typical network design problem would be to select a subset of thepossible road improvements subject to a budget constraint The design objective would

be to minimize the total travel cost for all travelers in the city network

It is interesting to see the wide range of network models that are related to the fixed

Trang 16

charge design problem If all arc construction costs are set to zero, then the fixed chargedesign model becomes a series of shortest path problems If all arc routing costs are set

to zero, the fixed charge design model becomes a Steiner tree problem on a graph Sincethe fixed charge design problem contains the Steiner problem as a special case, we can

be confident that it is very difficult to solve If the arc construction costs are all equaland totally dominate the routing costs (i.e., the optimal network design must be a tree),then the fixed charge design problem becomes the optimum communication spanning treeproblem defined by Hu [25]

Scott [4] has introduced another network synthesis problem, called the ”optimal network”problem, that is closely related to the fixed charge design problem The arc routing costs

in this problem are all linear functions of the total flow Arc capacities, which are allinitially zero, can be raised to infinity The objective is to minimize total routing costsubject to the usual capacity and flow routing constraints and the added constraint thatthe total construction costs cannot exceed a given budget Optimal network problem is

N P − hard

In communication network design when requirements can be for example a limitation ofthe maximum communication delay or the guarantee for a minimum signal-to-noise ratio,thus, the number of relaying nodes on any path between two communication partnersneeds to be restricted This problem is the BDM ST

The BDM ST problem have so much applications in design of wire-based communicationnetworks under quality of service requirements; in linear lightwave networks, where it canminimize interference in the network by limiting the traffic in the network lines; in datacompression, where some algorithms compress a file utilizing a tree data-structure, anddecompress a path in the tree to access a record; in ad-hoc wireless networks distributedmutual exclusion algorithms More detail about the applications of the BDM ST arepresented in the next section

Trang 17

1.1 Motivation

BDM ST problem has applications in several areas, such as in communication work design, distributed mutual exclusion, linear lightwave networks and bit-compressionfor information retrieval In the thesis of Abdalla [1] and DI Martin Gruber [17], detailedinformations about the motivation of BDM ST are presented Additional fields of ap-plication are described in [34], where the BDM ST appears as a subproblem within thevehicle routing problem Paper [3] deals with ad-hoc wireless networks while the paper [6]presents dynamic routing algorithms for multicasting in a linear lightwave network Weconsider several applications as bellow

net-In communication network design, the requirements can be a limitation of the maximumcommunication delay or the guarantee for a minimum signal-to-noise ratio Thus, thenumber of relaying nodes on any path between two communication partners needs to belimited by a given constant

In distributed mutual exclusion, before entering a critical section a computer in a tributed environment has to signal its intention and ask for permission A relevant part

dis-of the costs for these operations is the length dis-of the longest path the messages betweenthe computers have to travel Thus, when a tree structure is used as underlying commu-nication infrastructure as proposed in [43] the diameter of it has a direct influence on theefficiency of the mutual exclusion algorithm

In distributed system, messages passe from node to other node In [43], Raymon uses alogical spanning tree structure on a network of processors Messages are passed amongprocessors requesting entrance to a critical section and processors grating the privilege toenter The maximum number of message generated per critical-section execution is 2d,where d is the diameter of the spanning tree Therefore a small diameter is essential forthe efficiency of the algorithm Minimizing edge weights reduces the cost of the network.Another application can be found in information retrieval systems where large data struc-

Trang 18

tures called bitmaps are used in compressing large files, see [9] It is required to compressthe files, so that they will occupy less memory space, while allowing reasonably fast access.

In a first step similar vectors are clustered To further increase the compression rate notonly vectors within a cluster are coded relative to a representative but also the clusterrepresentatives themselves relative to each other, where the relation of the clusters is ex-pressed by a graph spanning them all Decoding process leads to the problem of creating aminimum spanning tree where the Hamming distance between two clusters is used as costfunction The length of the paths within this tree has a considerable impact on the timerequired to decompress bit-vectors part of the corresponding clusters As a consequence,there has to be a trade-off between the compression rate (costs of the spanning tree) andthe (de-)compression time (diameter of the tree)

BDM ST problem have so much applications and it is N P − hard problem SolvingBDM ST is a challenge We would like to propose the new algorithms for solving thisproblem to find good solution in reasonable time

algo-by heuristics Heuristics and especially metaheuristics can be seen as alternative whenlarge instances have to be solved in reasonable time, whereas these approaches are notable to guarantee to reach the optimum

Trang 19

There are a lot of heuristic algorithms based on different approachs, such as: GreedyHeuristics, Local Search, Evolutionary Algorithms, These approaches can only appliedfor specific problems Recently, researchers use metaheurisic algorithms to design a com-putational method that optimizes a problem by iteratively trying to improve a candidatesolution with regard to a given measure of quality Metaheuristics make few or no as-sumptions about the problem being optimized and can search on a very large spaces ofcandidate solutions However, metaheuristics do not guarantee an optimal solution is everfound Examples of metaheuristic algorithms are: Iterated Local Search [31], Tabu Search[14], or Variable Neighborhood Search (V N S) [23], Simulated Annealing [30], Ant ColonyOptimization (ACO) [11], Evolutionary Algorithms (EA) [5], and Memetic Algorithms[32].

We will briefly overview Greedy heuristic algorithms, Local search, Genetic algorithmswhich we use for developping new algorithm for solving BDM ST

Greedy heuristic algorithm is an algorithm that follows the problem solving heuristic of making the locally optimal choice at each stage with the hope of finding theglobal optimum

meta-In general, greedy algorithms have five pillars:

1 A candidate set, from which a solution is created

2 A selection function, which chooses the best candidate to be added to the solution

3 A feasibility function, that is used to determine if a candidate can be used to tribute to a solution

con-4 An objective function, which assigns a value to a solution, or a partial solution

5 A solution function, which will indicate when we have discovered a complete solution.Greedy algorithms produce good solutions on some mathematical problems, but not onothers Most problems for which they work well have two properties:

Trang 20

• Greedy choice property: the choice made by a greedy algorithm may depend onchoices made so far but not on future choices or all the solutions to the subproblem.

It iteratively makes one greedy choice after another, reducing each given probleminto a smaller one

• Optimal substructure: a problem has optimal substructure if the best next movealways leads to the optimal solution

Greedy algorithms mostly (but not always) fail to find the globally optimal solution,because they usually do not operate exhaustively on all the data They can make com-mitments to certain choices too early which prevent them from finding the best overallsolution later For example, all known greedy coloring algorithms for the graph coloringproblem and all other N P − complete problems do not consistently find optimum solu-tions Nevertheless, they are useful because they are quick to think up and often give goodapproximations to the optimum

Local search is a metaheuristic for solving computationally hard optimization problems.Local search can be used on problems that can be formulated as finding a solution maxi-mizing a criterion among a number of candidate solutions Local search algorithms movefrom solution to solution in the space of candidate solutions (the search space) until asolution deemed optimal is found or a time bound is elapsed

A local search algorithm starts from a candidate solution and then iteratively moves to aneighbor solution This is only possible if a neighborhood relation is defined on the searchspace As an example, the neighborhood of a vertex cover is another vertex cover onlydiffering by one node For boolean satisfiability, the neighbors of a truth assignment areusually the truth assignments only differing from it by the evaluation of a variable Thesame problem may have multiple different neighborhoods defined on it; local optimizationwith neighborhoods that involve changing up to k components of the solution is oftenreferred to as k − opt

Trang 21

Figure 1.1: Scheme of genetic algorithmTermination of local search can be based on a time bound Another common choice is

to terminate when the best solution found by the algorithm has not been improved in agiven number of steps Local search algorithms are typically incomplete algorithms, asthe search may stop even if the best solution found by the algorithm is not optimal Thiscan happen even if termination is due to the impossibility of improving the solution, asthe optimal solution can lie far from the neighborhood of the solutions crossed by thealgorithms

The genetic algorithm (GA) is a search heuristic that mimics the process of naturalevolution This heuristic is routinely used to generate useful solutions to optimization andsearch problems Genetic algorithms belong to the larger class of evolutionary algorithms(EA), which generate solutions to optimization problems using techniques inspired bynatural evolution, such as inheritance, mutation, selection, and crossover

The general scheme of a GA can be given in the figure 1.1

GAs are useful and efficient when:

• The search space is large, complex or poorly understood

• Domain knowledge is scarce or expert knowledge is difficult to encode to narrow thesearch space

Trang 22

• No mathematical analysis is available.

• Traditional search methods fail

Representation: Objects forming possible solution within original problem context arecalled phenotypes, their encoding, the individuals within the GA, are called genotypes.The representation step specifies the mapping from the phenotypes onto a set of genotypes.Candidate solution, phenotype and individual are used to denotes points of the space ofpossible solutions This space is called phenotype space Chromosome, and individual can

be used for points in the genotye space

Mutation Operator: It is applied to one genotype and delivers a modified mutant, the child

or offspring of it In general, mutation is supposed to cause a random unbiased change.Mutation has a theoretical role: it can guarantee that the space is connected

Crossover Operator: A binary variation operator is called recombination or crossover.This operator merges information from two parent genotypes into one or two offspringgenotypes Similarly to mutation, crossover is a stochastic operator: the choice of whatparts of each parent are combined, and the way these parts are combined, depend onrandom drawings The principle behind crossover is simple: by mating two individualswith different but desirable features, we can produce an offspring which combines both ofthose features

Parent Selection Mechanism: The role of parent selection (mating selection) is to guish among individuals based on their quality to allow the better individuals to becomeparents of the next generation Parent selection is probabilistic Thus, high quality indi-viduals get a higher chance to become parents than those with low quality Nevertheless,low quality individuals are often given a small, but positive chance, otherwise the wholesearch could become too greedy and get stuck in a local optimum

distSurvivor Selection Mechanism: The role of survivor selection is to distinguish among dividuals based on their quality In GA, the population size is (almost always) constant,

Trang 23

in-thus a choice has to be made on which individuals will be allowed in the next tion This decision is based on their fitness values, favoring those with higher quality Asopposed to parent selection which is stochastic, survivor selection is often deterministic,for instance, ranking the unified multiset of parents and offspring and selecting the topsegment (fitness biased), or selection only from the offspring (age-biased).

genera-Termination Condition: Notice that GA is stochastic and mostly there are no guarantees

to reach an optimum Commonly used conditions for terminations are the following:

1 The maximally allowed CPU times elapses

2 The total number of fitness evaluations reaches a given limit

3 For a given period of time, the fitness improvement remains under a threshold value

4 The population diversity drops under a given threshold

Population: The role of the population is to hold possible solutions A population is amultiset of genotypes In almost all GA applications, the population size is constant, notchanging during the evolutional search

Both, exact and heuristic methods, have their strengths and weaknesses In practice,the combination of them to hybrid algorithms often allows to improve solution quality(faster algorithms and/or better solutions) by exploiting synergies [17] Classifications andsurveys of different hybridizations of exact optimization techniques with metaheuristics can

be found in [41, 38, 42]

Nowaday, heuristic and metaheuristic are suitable approaches for solving N P − hardproblem On this thesis, we will use local search and genetic algorithm for developpingnew algorithms for solving BDM ST problem

Trang 24

Contributions will be presented in four chapters and can be summerized as follow:

1 We propose the Center-Based Recursive Clustering (CBRC) heuristic algorithm.CBRC is based on RGH (and CBT C) We extend the concept of center to each level

of the partially constructed spanning tree The algorithm can be seen as recursivelyclustering the vertices of the graph: every internal node of the spanning tree isthe center of the sub-graph in the subtree rooted at this node and we recursive tofind the best center We also survey the constraint between the weight of tree andbounded diameter We experiment and compare the result between our algorithmand others - RGH, RGH − I, CBT C, OT T C, CBT C − I - on the Euclidean andNon-Euclidean instances up to 1000 vertices On the Euclidean instances, the resultsshow the effectiveness of our algorithms on the best, mean and deviation values Onthe Non-Euclidean instances, the best results found by CBRC − I are the same withthe one found by OT T C

2 We also introduce three multi-parent recombination operators in genetic algorithmfor solving BDM ST problem We consider three different methods for choosingparents: the first one is based on Levenshtein distance between the parents, the sec-ond one uses the best individual in the population and the last one uses randomlychosen individual in the population We also experiment each method of choosing

Trang 25

parents with three ways for adding edges from the parents into the offspring: choosethe edge randomly, choose the edge which have minimum weight, choose the edgewhich have minimum weight in maximum sharing edge from the parents We exper-iment on the Euclidean instances up to 1000 vertices We concentrate on analyzingthe recombination operator in genetic algorithms So we compare the results of ouralgorithms using, respectively, three mentioned multi-parent recombination opera-tors with another genetic algorithm using two-parent recombination operator on thesame problem.

3 We propose a new hybrid genetic algorithm for solving BDM ST problem The newgenetic algorithm uses multi-population, where each population is initialized with adifferent well known heuristic The individuals in each population will subsequentlycompete for positions in a selection population, using a simulated annealing mecha-nism based on proportionate selection; in the selection population, they will combineand evolve toward the optimum Therefore, our research approaches employ differ-ent initial biases by using different heuristics for initialization, and to hybridize theindividuals from these populations to promote the exploratory capacity of the GA

We compare our results with other genetic algorithms, namely, the genetic algorithm

in [40] of Raidl and Julstrom (called RJ − ESEA), the genetic algorithm of Alokand Gupta in [46] (called P EA − I) and the genetic algorithm in each population onthe Euclidean and Non-Euclidean instances up to 1000 vertices The results showthe effectiveness of our algorithm

4 We propose steady-state genetic algorithms which use different heuristic algorithmsfor decoding We modify the decoder and the replacement policy used in P EA − I

so as to improve its performance We use four decoders by different well-knownheuristic algorithms: RGH, RGH − I, CBRC, CBRC − I We experiment on theEuclidean and Non-Euclidean instances up to 1000 vertices and the results show the

Trang 26

outperform of our algorithms than the others.

1.5 Outline

This dissetation is organized as follow

In chapter 1, we introduce the motivation of the thesis, methodologies Scope of researchesand contributions are also presented

After the introduction, chapter 2 present formulation of the BDM ST problem and marize the related works in the field of the BDM ST problem To our best knowledge, all

sum-of the algorithms for solving BDM ST only suitable for one kind sum-of the problem instance:Euclidean or Non-Euclidean instances So, in the remain chapters, we will present ouralgorithms for solving BDM ST We hope that our propose algorithms can be applied forboth Euclidean and Non-Euclidean instances to find better solution

A new greedy heurisitic algorithm (Center-Based Recursive Clustering) is presented inchapter 3

Evolutionary algorithms have proven effective on several hard spanning tree problems So,

in the chapter 4, 5, 6, we present our genetic algorithms for solving BDM ST

An EAs recombination operator should provide strong heritability This means that thetree produced by recombining parent trees should consist mostly of parental edges It isalso beneficial to favor edges that are common to the parents In the chapter 4, we presentmulti-parent recombination operator in genetic algorithm for solving BDM ST

Almost all genetic algorithms for solving the BDM ST problem strongly depend on theirparticular heuristics, in that the heuristics were usually used to initialize GA populationsand played an important role in the design of genetic operators However, it has beensuggested in the literature that the behaviours of different heuristics vary over differentclasses of problem instances [46]

In chapter 5, we introduce a new hybrid genetic algorithm for solving BDM ST problems

Trang 27

that uses a multi-population, where each population is initialized with a different wellknown heuristic Chapter 5 presents new hybrid multi-population genetic algorithm inwhich each population is initialized with a different well know heuristic Chapter 6 willintroduce steady-state genetic algorithm for solving BDM ST problem which uses differ-ent heurisitics for decoding the tree.

Finally, the conclusion summarizes the works

Trang 28

Chapter 2

Bounded Diameter Minimum

Spanning Tree and Related Works

This chapter presents the formulation of BDM ST and summarizes the related works inthe field of the BDM ST problem

Before introduce the approaches for solving BDM ST , we state the problem

2.1 Problem formulation

We need to introduce some concepts relating to tree diameter and center before theBDM ST problem can be formally stated

Let T = (V, ET) be a tree with node set V and edge set ET

Definition 1: (Eccentricity) The eccentricity of a node v ∈ V is defined as the maximumnumber of edges on the path between v and any other node within the tree T

Definition 2: (Diameter) The diameter of a tree T , denoted as diam(T ), is the maximaleccentricity over all nodes in T (i.e the length of maximal path between two arbitrary ver-tices in T )

Definition 3: (Center of tree) The center of a tree is the single vertex (if the diameter of

Trang 29

the tree is even) or the two connected vertices (if the diameter is odd) of minimum tricity Suppose that a diameter of the tree is defined by the path v1, v2, , v[k

2], v[k

2]+1) is called a center edge

Definition 4: (Radius) The radius of a tree is the minimum eccentricity among all nodes

of the tree

Definition 5: (Bounded Diameter Minimum Spanning Tree Problem - BDM ST ) Let

G = (V, E) be a connected undirected graph with positive edge weights w(e) The BDM STproblem can be formulated as follows: among all spanning trees of G whose diameters donot exceed a given upper bound D ≥ 2, find the spanning tree with the minimal cost (sum

of the weights on edges of the tree) As in almost all studies of the BDM ST problem, andwithout lost of generality, we will assume that G is a complete graph

Thus, we can formulate the problem as:

Find a spanning tree T = (V, ET) of G that minimizes

so the center of tree is only one vertex In figure 2.2, the bounded diameter is odd number,

so v1, v2 are the centers of tree and (v1, v2) is center edge

Definition 6: (Decision BDM ST problem) Let G = (V, E) be a connected undirectedgraph with edge weights are 0 or 1 and two intergers D ≥ 2 and q ≥ 2 Does exit aspanning tree with diameter less than or equal D and the weight of tree is q?

Trang 30

Figure 2.1: The BDST with 19 vertices and

bounded diameter D=4, v is the center of the

in O(n2) (D = 2), respectively by iterating over all edges and connecting the remainingnodes in time O(m.n) (D = 3), which is bounded above by O(n3) for complete graphs Incase, 4 ≤ |V | − 1, BDM ST become N P − hard problem Detail about special cases with

D < 4 can be seen in [16] Reduction of BDM ST is introduced in [13, 17]

2.2 Related Optimization and Decision Problems

Some of the well-known constrained minimum spanning tree problems require imizing the weighted diameter of the spanning tree of a randomly-weighted graph Theseproblems are closely related to the problems that require optimizing the weighted radius

min-of the spanning tree The main difference between these problems and the BDM ST lem lies in the way they disregard the number of edges in the longest path in the tree.Approaches to solve these problems can be sometimes modified to solve the BDM ST

Trang 31

prob-problem, and vise versa In this section, we introduce some optimization and decisionproblems concern with BDM ST

Let G = (V, E) be a connected undirected graph with positive edge weights w(e) Suppose

T = (V, ET) be a spanning tree of G

Problem 1: Bounded Weighted Diameter Minimum Spanning Tree problem (BW DM ST )Among all spanning trees of G whose weight of diameters do not exceed a given upper bound

D, find the spanning tree with the minimal cost

Problem 2: Minimum Weighted Diameter Bounded Spanning Tree problem (M W DBST )Among all spanning trees of G whose weight of tree do not exceed a given upper bound S,find the spanning tree with the minimal weighted diameter

Problem 3: Bounded Weighted Radius Minimum Spanning Tree problem (BW RM ST )Among all spanning trees of G whose weight of radius do not exceed a given upper bound

R, find the spanning tree with the minimal cost

Problem 4: Minimum Weighted Radius Bounded Spanning Tree problem (M W RBST )Among all spanning trees of G whose weight of tree do not exceed a given upper bound S,find the spanning tree with the minimal weighted radius

Problem 5: Bounded Weighted Diameter Bounded Spanning Tree problem (BW DBST )Among all spanning trees of G whose weight of diameters do not exceed a given upperbound D, find the spanning tree with the weight of tree do not exceed a given upper boundS

Problem 6: Bounded Weighted Radius Bounded Spanning Tree problem (BW DBST )Among all spanning trees of G whose weight of radius do not exceed a given upper bound

R, find the spanning tree with the weight of tree do not exceed a given upper bound S.Two applications closely related to BDM ST problem are mentioned bellow

Problem 7: Hop Constraint Minimum Spanning Tree Problem (HCM ST ) Given a graph

G = (V, E) with positive edge weight w(e) A root r and a hop limit H Find spanningtree T = (V, ET) of G that minimal cost and each path in T from r to any other node

Trang 32

consists of no more than H edges.

Generalize of HCM ST can be defined as follow:

Problem 8: Distance or Delay Constrained Minimum Spanning Tree Problem Given agraph G = (V, E) with positive edge weight w(e) and delay value de ≥ 0 A root r and abounded delay L Find spanning tree T = (V, ET) of G that minimal cost and the delay ofall edge in the path from r to other node less than L

Three other bellow problems are constraint optimization problems concern to spanningtree

Problem 9: k − Cardinality Tree Problem Given an undirected graph G = (V, E) withedge weights and a positive integer number k, the k − Cardinality Tree problem consists

of finding a subtree T of G with exactly k edges and the minimum possible weight

Problem 10: Degree-Constrained Minimum Spanning Tree Problem Let G = (V, E) be aconnected undirected graph with positive edge weight w(e) DCM ST can be formulated asfollows: among spanning trees of G whose degree is not exceed a given upper bound d ≥ 2,find the spanning tree with minimum cost

Problem 11: Capacitated Minimum Spanning Tree Problem Given an undirected weightedgraph G, a node r of G and in integral value Q, CM ST consists of finding a minimumspanning tree T of G rooted at r such that the number of nodes of each subtree of T doesnot exceed Q

All of above problems are N P − hard and can be seen in [24]

In the next section, we will review the approaches for solving BDM ST

2.3 Related works

The BDM ST problem has been shown to be also approximate-hard, in that there

is no polynomial time algorithm which could guarantee to find a solution that has a costwithin log(|V |) of the optimum, unless P = N P Techniques for solving the BDM ST

Trang 33

problem may be classified into two main categories: exact methods and inexact (heuristic)methods Exact algorithms are guaranteed to find an optimal solution The run-timeincreases dramatically with the instance size, and often only apply for small instances.Heuristic algorithms will be used for larger instances and it guarantee to find good solutions

in a limited time

2.3.1 Exact approaches

Exact approaches for solving the BDM ST problem are based on mixed linearinteger programming [35, 15] Achuthan et al [35] presented three branch-and-boundalgorithms for it and solved instances with up to 100 vertices Gouveia and Magnanti[15] described a network flow model that solved instances with up to 100 vertices and1,000 edges, and Santos et al [12] extended the methods of Achuthan et al [1994] Theypresented a formulation based on lifted Miller-Tucker-Zemlin inequalities responsible foravoiding cycles and ensuring the maximum diameter This approach is suitable for densegraphs but take so much time and could not deal with large size problem instances.More recently, Gruber and Raidl suggested a branch and cut algorithm based on compact0-1 integer linear programming [19] It is further strengthened by dynamically adding vi-olated connection and cycle elimination constraints within a branch-and-cut environment.They model BDM ST problem into two cases: even diameter and odd diameter and solve

it seperately They experiment on the graph with maximum |V | = 40 and |E| = 200.However, being deterministic and exhaustive in nature, exact approaches could only beused to solve small problem instances (e.g complete graphs with less than 100 nodes)

2.3.2 Heuristic Methods

Since exact algorithms are not able to solve the instances with thousands of nodes,heuristics have been developed We briefly summerize some construction heuristic algo-rithms which can solve for the instances up to thousands of nodes

Trang 34

2.3.2.1 One Time Tree Construction Algorithm

Abdalla et al [2] presented a greedy heuristic algorithm, the One Time Tree struction (OT T C) for solving the BDM ST problem OT T C is based on Prims algorithm

Con-in [37] It starts with a set of vertices, Con-initially contaCon-inCon-ing a randomly chosen vertex.The set is then repeatedly extended by adding a new vertex that is nearest (in cost) tothe set, as long as the inclusion of the new node does not violate the constraint on thediameter of the tree The algorithm time for appending each new edge, in the worst case,

is O(n2) This step is repeated n − 1 times, so the algorithm time is O(n3) The quality

of the tree indentified by the algorithm depends heavily on the start vertex To identify

a low-weight BDST , the algorithm should be run starting from each vertex in the targetgraph The time of the entire process is then O(n4) This algorithm is time consuming,and its performance is strongly dependend on the starting vertex

Figure 2.3 shows a smallest BDST found by OT T C, of diameter D = 5 on n = 100

Figure 2.3: The best BDST found by OT T C algorithm on the Euclidean problem instancewith n = 100, D = 5

vertices in the unit square Short edges connect only a few vertices near the center ofthis tree The remaining vertices connect via longer edges to this core, forming a star-likestructure

Trang 35

2.3.2.2 Center-Based Tree Construction Algorithm

In [28], the Center-Based Tree Construction Heuristic (CBT C) applies the samePrim-based strategy but uses the start vertex as the center of the spanning tree (if D iseven) or as one of two vertices in the center (if D is odd) This algorithm does not need

to bound each vertex eccentricity It suffices to bound each vertex’s depth by the number

of edges on the path from the tree’s center to the vertex No vertex can be more than

D

2 edges from the center, and the depth thus the eligibility of a vertex is fixed when itjoins the tree Updating this algorithm data structures requires only linear time in theworst case (constant time when a new vertex depth is D2 ), so the time complexity ofthe algorithm is O(n2) and O(n3) if starting at each vertex

Julstrom also modified CBT C algorithm by choosing the starting vertex and all quent vertices at random from those not yet in the spanning tree The connection ofeach new vertex v to the tree remains greedy It always uses the lowest-weight edge thatconnects v to a vertex in the tree whose depth is less than D2 The modified algorithmcalled Randomized center-based Tree Construction (RT C) The time complexity of RT C,like that of CBT C, is O(n2) Running the randomized heuristic n times and reportingthe best solution is thus O(n3)

subse-2.3.2.3 Randomized Greedy Heuristic Algorithm

Raidl and Julstrom proposed in [40] a modified version of OT T C, called domized Greedy Heuristics (RGH) RGH starts from a centre by randomly selecting avertex and keeping it as the fixed center during the search It then repeatedly extendsthe spanning tree from the center by adding a randomly chosen vertex from the remainingvertices, and connecting it to a vertex that is already in the tree via an edge with thesmallest weight

Ran-The algorithm also differ from OT T C in that it begin by fixing the center of the tree.The starting vertex v0 is chosen randomly If D is even, v0 is the center If D is odd,

Trang 36

another vertex v1 is chosen at random and v0, v1 are the centers; the edge joining them

is the first in the tree Instead of maintaining the eccentricity of vertex and path lengthsbetween vertices, the randomized heuristic stores the depth of each connected vertex: thenumber of edges on the path from it to the center This value is set when a vertex joinsthe tree and does not subsequently change No vertex may have a depth greater thanD

2;otherwise the diameter constraint is violated or v0(v1) is displaced from the center.Sketch of RGH algorithm can be presented in the algorithm 1

Identifying the vertex u ∈ C that is nearest to v requires time O(|C|) = O(n) ThisAlgorithm 1 Randomized Greedy Heuristic Algorithm

v ←a random vertex from U ;

u ←vertex from C with smallest w((u, v));

Trang 37

2.3.2.4 Improved Greedy Heurisitics (RGH − I and CBT C − I)

Singh and Gupta [46] extended greedy constructive heuristic with a local searchstep that reevaluate previous vertex connections after appending each new vertex

They check for each vertex v if it can be connected to a better parent vertex other thanthe one to which it is currently connected without violating the diameter constraint Thevertex, which offers the maximum reduction in the cost of BDST is selected and wholesubtree rooted at vertex v is deleted from its current location and reconnected to the treevia the vertex selected

This improvement is applicable to CBT C also and the obtained algorithm will be denoted

by CBT C − I

Figure 2.4 and 2.5 show the best BDST found by CBT C algorithm on the Euclidean

Figure 2.4: The best BDST found by CBT C

algorithm on the Euclidean problem instance

with n = 100, D = 10

Figure 2.5: The best BDST found byCBT C − I algorithm on the Euclidean prob-lem instance with n = 100, D = 10

instance with the number of vertices is 100 and D = 10 respectively The tree on thefigure 2.5 found by apply the local search on the best tree found by CBT C algorithm(figure 2.4) and can be seen on the circle mark

Figure 2.6 and 2.7 show the best BDST found by RGH on the Euclidean probleminstance with n = 100, D = 10 respectively The tree on the figure 2.7 found by applythe local search on the best tree found by RGH algorithm (figure 2.6) and can be seen

Trang 38

Figure 2.6: The best BDST found by RGH

algorithm on the Euclidean problem instance

with n = 100, D = 10

Figure 2.7: The best BDST found byRGH − I algorithm on the Euclidean prob-lem instance with n = 100, D = 10

on the circle mark Singh and Gupta [46] experiment on the Euclidean instances with thenumber of vertices are 50, 100, 250, 500 and 1000 diameter bound is set to 5, 10, 15, 20,

25 respectively

2.3.2.5 Hierarchical clustering heuristic algorithm - HCH

In [21], Gruber and Raidl propose a constructive heuristic that exploits a chical clustering to guide the process of building a backbone The clustering heuristicconstructs diameter constrained trees within three steps: determining a hierarchical clus-tering, reducing the height of this clustering according to the diameter bound, and finallyderiving a BDM ST from this height-restricted clustering

hierar-They experiment on the Euclidean instances from Beasley’s OR-Library [7] |V | = 1000and 15 first instances are used On large Euclidean instances the BDM ST s obtained

by the HCH outperforms other construction heuristics significantly, especially when thediameter bound is tight and it takes only few seconds but it can not apply to the Non-Euclidean instances

Trang 39

2.3.2.6 Comments

In Singh and Gupta [46], they experiment and compare the result between OT T C,RGH, RGH − I, CBT C, CBT C − I on the Euclidean and Non-Euclidean instances inwhich the number of vertices are 50, 100, 250, 500, 1000 and the diameter bound is set to

5, 10, 15, 20, 25 respectively The experimental results show that:

On the Non-Euclidean instances, RGH − I and CBT C − I give better results than RGHand CBT C, respectively on the best and average results Both RGH and RGH−I performmuch worse than OT T C, CBT C and CBT C − I Even RGH − I cannot compete with

OT T C, CBT C and CBT C − I On almost instances, OT T C gives the best results onthe min, mean value

In [28], Julstrom experimsent on 240 graphs, 120 Euclidean and an equal number withedge weights chosen at random The Euclidean graphs consisted of points randomly placed

in the unit square, 30 graphs each of n = 100, 250, 500, and 1,000 points In each set ofgraphs, 15 instances can be found in OR-library [7], where they are listed as instances ofthe Euclidean Steiner problem, and 15 more were randomly generated In each set, thepoints are the vertices of complete graphs whose edge weights are the Euclidean distancesbetween the points

Four more sets of 30 complete graphs also consisted of n = 100, 250, 500, and 1,000vertices The edge weights of these graphs were chosen at random on the interval [0.01,0.99]

On the Euclidean instances, diameter bound is set to 5, 10, 15, 25 for |V | = 100, 10, 15,

20, 40 for |V | = 250, 15, 30, 45, 60 for |V | = 500, 20, 40, 60, 100 for |V | = 1000 Onrandom edge weight instances, diameter bound is set to 5, 7, 10, 15 for |V | = 100, 5, 10,

15, 20 for |V | = 250, 10, 15, 20, 30 for |V | = 500, 10, 20, 30, 50 for |V | = 1000

The experimental results on [28, 46, 40] show that:

On the Euclidean instances, the best and average results found by RGH − I are betterthan RGH, OT T C, CBT C and CBT C − I When D is small, CBT C identifies BDST s

Trang 40

that are slightly shorter than those OT T C finds, but RT C trees are much shorter thanthose of OT T C and CBT C When OT T C, CBT C are applied to problem instances whosevertices are points in Euclidean space and whose edge weights are the distances betweenthe points, the weight of BDM ST found by the heuristic are much larger than minimum,especially in the case D is smaller than n OT T C and CBT C build backbones of shortedges; the remaining points connect to these backbones via longer edges, so OT T C andCBT C build longer trees than necessary This observation holds for almost all BDM STproblem instances With larger diameter bounds, the differences in the three algorithmsresults diminish, to the particular advantage of CBT C.

On random weight instances, CBT C identifies have on average lower weights than those

OT T C RT C is always worse than that of both OT T C and RT C The lack of Euclideanstructure in the random-weight instances make OT T C and CBT C better than RT C

2.3.3 Metaheuristic algorithms

Beside the greedy construction heuristics, several research groups have developedevolutionary algorithms (EAs) for solving the BDM ST and hope that they could findgood result within reasonable time

In EA, representation methods are important role and decide all the operator in thealgorithm

Representation methods: There are a lot of methods for representing individuals, especiallyspanning tree: Characteristic vectors, Predecessor coding, Prufer number, Link and node,Edge-set-encoding, Permutation code In this thesis, we will use Edge-set-encoding andPermutation code

• Edge-set-encoding: The problem of spanning tree representation has been studiedextensively in the literature References in [36, 26] and specially [45] contain sub-stantial discussions and analysis of different representations from theoretical andpractical perspectives For the BDM ST problem, three representations have been

Định dạng
Số trang	126
Dung lượng	0,94 MB