MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY GENETIC ALGORITHMS FOR SOLVING BOUNDED DIAMETER MINIMUM SPANNING TREE PROBLEM By Huynh Thi Thanh Binh Supe
Trang 1MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
GENETIC ALGORITHMS FOR SOLVING BOUNDED DIAMETER MINIMUM SPANNING TREE PROBLEM
By Huynh Thi Thanh Binh Supervisor: Associate Professor Nguyen Duc Nghia
A Dissertation submitted in partial fulfillment of the requirements
for the Degree of Doctor of Philosophy in Engineering
Trang 2Table of Contents
1.1 Motivation 7
1.2 Methodologies 8
1.3 Scope of research 14
1.4 Contributions 14
1.5 Outline 16
2 Bounded Diameter Minimum Spanning Tree and Related Works 18 2.1 Problem formulation 18
2.2 Related Optimization and Decision Problems 20
2.3 Related works 22
2.3.1 Exact approaches 23
2.3.2 Heuristic Methods 23
2.3.2.1 One Time Tree Construction Algorithm 24
2.3.2.2 Center-Based Tree Construction Algorithm 25
2.3.2.3 Randomized Greedy Heuristic Algorithm 25
Trang 32.3.2.4 Improved Greedy Heurisitics (RGH − I and CBT C − I) 27
2.3.2.5 Hierarchical clustering heuristic algorithm - HCH 28
2.3.2.6 Comments 29
2.3.3 Metaheuristic algorithms 30
2.3.4 Conclusion 39
3 Center-Based Recursive Clustering Heuristic Algorithm 41 3.1 The new greedy heuristic - Center-Based Recursive Clustering (CBRC) 41
3.2 The improvement of Centre-Based Recursive Clustering - CBRC − I 44
3.3 Experiments 45
3.3.1 Problem instances 45
3.3.2 Experiment setup 46
3.3.3 Result 46
3.4 Discussion 55
3.5 Conclusion 56
4 Genetic algorithm with multi-parent recombination operator 57 4.1 Individual representation and genetic operators 58
4.2 Experiments 61
4.2.1 Problem instances 61
4.2.2 Experiment setup 62
4.2.3 System setting 62
4.2.4 Results and discussion 63
4.3 Conclusion 77
5 Multi-population Genetic Algorithm 79 5.1 Structure of the genetic algorithm 80
5.2 Experiments 83
Trang 45.2.1 Problem instances 83
5.2.2 Experiment setup 83
5.2.3 System setting 84
5.2.4 Result 85
5.2.5 Discussion 85
5.3 Conclusion 95
6 Steady-state genetic algorithm 97 6.1 Steady state genetic algorithm structure 97
6.1.1 Individual representation and initial population 97
6.1.2 Crossover 98
6.1.3 Mutation 98
6.1.4 Selection 99
6.2 Replacement policy 99
6.3 Experiments 100
6.3.1 Problem instances 100
6.3.2 Experiment setup 101
6.3.3 Parameter 101
6.3.4 Result 101
6.4 Conclusion 106
Trang 5List of Figures
1.1 Scheme of genetic algorithm 112.1 The BDST with 19 vertices and bounded diameter D=4, v is the center ofthe tree 202.2 The BDST with 19 vertices and bounded diameter D=5, v1, v2 are thecenters of the tree 202.3 The best BDST found by OT T C algorithm on the Euclidean probleminstance with n = 100, D = 5 242.4 The best BDST found by CBT C algorithm on the Euclidean probleminstance with n = 100, D = 10 272.5 The best BDST found by CBT C − I algorithm on the Euclidean probleminstance with n = 100, D = 10 272.6 The best BDST found by RGH algorithm on the Euclidean problem in-stance with n = 100, D = 10 282.7 The best BDST found by RGH − I algorithm on the Euclidean probleminstance with n = 100, D = 10 282.8 A spanning tree on twelve nodes and an its edge-set representation 312.9 A spanning tree on eleven nodes and an its permutation-code representation 332.10 Center Move Mutation 342.11 Edge Delete Mutation 342.12 Subtree-Optimize Mutation 35
Trang 62.13 The best BDST found by J R−ESEA algorithm on the Euclidean probleminstance with n = 250, D = 15 392.14 The best BDST found by J R − P EA algorithm on the Euclidean probleminstance with n = 250, D = 15 392.15 The best BDST found by P EA − I algorithm on the Euclidean probleminstance with n = 250, D = 15 393.1 A star-like structure of a typical solution to the BDM ST problem 423.2 Greedy Edge Delete Local search 453.3 The best BDST found by CBRC heuristic on the Euclidean problem in-stance with n = 100, D = 10 453.4 The best BDST found by CBRC − I heuristic on the Euclidean probleminstance with n = 100, D = 10 454.1 Comparison between the sum of the best solutions found by EA − xy2algorithm on all the problem instances 684.2 Comparison between the sum of the best solutions found by EA − xy5algorithm on all the problem instances 684.3 Comparison between the sum of the best solutions found by EA − xy7algorithm on all the problem instances 684.4 Comparison between the sum of the best solutions found by EA − xy9algorithm on all the problem instances 684.5 Comparison between the sum of the best solutions found by EA − xdkalgorithm on all the problem instances (x = b, r, l; k = 2, 5, 7, 9) 694.6 Comparison between the sum of the best solutions found by EA − xgkalgorithm on all the problem instances (x = b, r, l; k = 2, 5, 7, 9) 694.7 Comparison between the sum of the best solutions found by EA − xmkalgorithm on all the problem instances (x = b, r, l; k = 2, 5, 7, 9) 69
Trang 74.8 Comparison between the sum of the average solutions found by EA − xy2algorithm on all the problem instances (x = b, r, l) 694.9 Comparison between the sum of the average solutions found by EA − xy5algorithm on all the problem instances 704.10 Comparison between the sum of the average solutions found by EA − xy7algorithm on all the problem instances 704.11 Comparison between the sum of the average solutions found by EA − xy9algorithm on all the problem instances (x = b, r, l) 704.12 Comparison between the sum of the average solutions found by EA − xdkalgorithm on all the problem instances (x = b, r, l; k = 2, 5, 7, 9) 704.13 Comparison between the sum of the average solutions found by EA − xgkalgorithm on all the problem instances (x = b, r, l; k = 2, 5, 7, 9) 714.14 Comparison between the sum of the average solutions found by EA − xmkalgorithm on all the problem instances (x = b, r, l; k = 2, 5, 7, 9) 714.15 Comparision between the best solution found by GA1, GA2, GA3, GA4,
GA5, GA6 on all the problem instance 714.16 Comparision between the standard deviation of the solution found by GA1,
GA2, GA3, GA4, GA5, GA6 on all the problem instance 715.1 Multi-population model 805.2 The comparision between the best results found by GA11, GA12, GA13,
GA14 and HGA on the instance with n = 250, D = 15, instance 1 865.3 The comparision between the mean results found by GA11, GA12, GA13,
GA14 and HGA on the instance with n = 250, D = 15, instance 1 915.4 The number of individuals from GA11, GA12, GA13, GA14 migrate to GAf inal 916.1 P EA − I algorithm 99
Trang 8List of Tables
3.1 Diameter Bound 463.2 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theEuclidean instances of the BDM ST problem with n = 100 and D =
5, 7, 9, 11, 13, 15 473.3 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theEuclidean instances of the BDM ST problem with n = 250 and D =
5, 10, 13, 15, 17, 20, 25 483.4 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theEuclidean instances of the BDM ST problem with n = 500 and D =
10, 15, 18, 20, 22, 25, 30 493.5 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theEuclidean instances of the BDM ST problem with n = 1000 and D =
15, 20, 23, 25, 27, 30, 35 503.6 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theNon-Euclidean instances of the BDM ST problem with n = 100 and D =
5, 7, 9, 11, 13, 15 513.7 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theNon-Euclidean instances of the BDM ST problem with n = 250 and D =
5, 10, 13, 15, 17, 20, 25 52
Trang 93.8 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theNon-Euclidean instances of the BDM ST problem with n = 500 and D =
10, 15, 18, 20, 22, 25, 30 533.9 Results of OT T C, CBT C, RGH, CBRC, CBRC − I, RGH − I on theNon-Euclidean instances of the BDM ST problem with n = 1000 and D =
15, 20, 23, 25, 27, 30, 35 544.1 The rate of the heuristic algorithms use for initialization of the population
in each experiment genetic algorithm 634.2 Comparision between the result found by EA − xy2; x = d, g, m; y = l, r, b
on the 20 Euclidean problem instances 644.3 Comparision between the result found by EA − xy5; x = d, g, m; y = l, r, b
on the 20 Euclidean problem instances 654.4 Comparision between the result found by EA − xy7; x = d, g, m; y = l, r, b
on the 20 Euclidean problem instances 664.5 Comparision between the result found by EA − xy9; x = d, g, m; y = l, r, b
on the 20 Euclidean problem instances 675.1 Comparision between the result with different crossover probabily on theEuclidean problem instance with number of vertices are 250, D=15 855.2 Comparision between the result with different crossover probabily on theEuclidean problem instance with number of vertices are 250, D=15 865.3 Comparision between the result found by RJ − ESEA, P EA − I, HGA,
M HGA on the 20 Euclidean problem instances 875.4 Comparision between the result found by RJ − ESEA, P EA − I, HGA,
M HGA on the 20 Non-Euclidean problem instances 885.5 Result of GA11, GA12, GA13, GA14 and HGA on 20 Euclidean BDM STproblem instances 89
Trang 105.6 Result of GA11, GA12, GA13, GA14and HGA on 20 Non-Euclidean BDM STproblem instances 905.7 Result of GA21, GA22, GA23, GA24and M HGA on 20 Euclidean BDM STproblem instances 925.8 Result of GA21, GA22, GA23, GA24and M HGA on 20 Euclidean BDM STproblem instances 936.1 Results of P EA − RGH, P EA − RGHI, P EA − CBRC, P EA − CBRCI,
P EA − I on the 20 Euclidean instances of the BDM ST problem of size
100, 250, 500 and 1,000 1026.2 Average number of iterations required by P EA − RGH, P EA − RGHI,
P EA − CBRC, P EA − CBRCI, P EA − I to reach the best solution onthe 20 Euclidean instances of BDM ST problem of size 100, 250, 500 and1,000 1036.3 Results of P EA − RGH, P EA − RGHI, P EA − CBRC, P EA − CBRCI,
P EA − I on the 20 Non-Euclidean instances of the BDM ST problem ofsize 100, 250, 500 and 1,000 1046.4 Average number of iterations required by P EA − RGH, P EA − RGHI,
P EA − CBRC, P EA − CBRCI, P EA − I to reach the best solution onthe 20 Non-Euclidean instances of BDM ST problem of size 100, 250, 500and 1,000 105
Trang 11The Bounded Diameter Minimum Spanning Tree (BDM ST ) problem is a torial optimization problem that arises in many applications such as design of wire-basedcommunication networks under quality of service requirements; design linear lightwavenetworks, where it can minimize interference in the network by limiting the traffic in thenetwork lines Another practical application requiring a BDM ST arises in data compres-sion, where some algorithms compress a file utilizing a tree data-structure, and decompress
combina-a pcombina-ath in the tree to combina-access combina-a record in combina-ad-hoc wireless networks distributed mutucombina-al clusion algorithms
ex-Let G = (V, E) be a connected undirected graph with positive edge weights w(e) (e is anedge of graph) The BDM ST problem can be formulated as follows: among spanningtrees of G whose diameters do not exceed a given upper bound D ≥ 2, find a spanningtree with the minimal cost (sum of the weights on edges of the tree) As in almost allstudies of the BDM ST problem, and without lost of generality, we will assume that G is
a complete graph
This problem is known to be N P − hard for 4 ≤ D < |V | − 1 Moreover, the BDM STproblem has been shown to be also approximate-hard, in that there is no polynomial timealgorithm which could guarantee to find a solution which has a cost within log(|V |) ofthe optimum, unless P = N P Therefore, heuristic and meta-heuristic techniques are cur-rently the only practical method for improving the solution quality in solving the BDM STproblem, especially when |V | is large
Trang 12In this thesis, we survey the literature on the BDM ST and then present new algorithmsfor solving this problem.
First, we propose a greedy heuristic algorithm called Center-Based Recursive Clustering(CBRC) We extend the concept of center to each level of the partially constructedspanning tree The algorithm can be seen as recursively clustering the vertices of thegraph: every internal node of the spanning tree is the center of the sub-graph in the sub-tree rooted at this node and we recursive to find the best center The new heuristic iscompared with other well-known heuristics for solving the BDM ST problem, namely, theOne-Time-Tree-Construction (OT T C), the Randomized Greedy Heuristic (RGH) of Raidland Julstrom, the Center-Based Tree Construction (CBT C) of Julstrom, the RandomizedGreedy Heuristic with post-improvement (RGH − I) and Center-Based Tree Constructionwith post-improvement (CBT C − I) of Singh and Gupta
And then, we introduce multi-parent recombination operator in Genetic Algorithms (GAs)for solving the BDM ST problem The proposed multi-parent recombination operator al-lows using more than two parents to create offspring We consider three different methodsfor choosing parents Three new methods for adding edges from the parents to the off-spring also considered For each of the three methods of choosing parents and three waysfor adding edges, we also experiment genetic algorithms for solving BDM ST problemwith different number of parents We discuss and analyze the efficiency of using differ-ent heuristic algorithms to initialize the population in genetic algorithm for solving theBDM ST problem
We present a new genetic algorithm (GA) which use multi-population where each tion is initialized with a different well know heuristic The individuals in each populationwill subsequently compete for positions in a selection population, using a simulated an-nealing mechanism based on proportionate selection In the selection population, theywill combine and evolve toward the optimum We compare our results with other GA.Beside generational genetic algorithm, recently, many researchers are interested in steady-
Trang 13popula-state genetic algorithm We present steady-popula-state genetic algorithms which use differentheuristic algorithms for decoding We modify the decoder and the replacement policyused in P EA − I so as to improve its performance We use four decoders by differentwell-known heuristic algorithms: RGH, RGH − I, CBRC, CBRC − I.
Experimental results are also reported to compare the efficiency of different heuristic andgenetic algorithms for solving BDM ST problem
Trang 14Dr Truong Thi Dieu Linh, Dr Le Minh Hoang, Dr Le Trong Vinh.
I would like to give special thanks to my parents, my husband and my daughters, who gave
me unconditional support and encouragement during the long time I needed to conductresearch and write this thesis
Also, thank to Ministry of Education and Training, Hanoi University of Science and nology, National Foundation for Science and Technology Development for their fundingfor my research I would like to thank to my colleagues at School of Information andCommunication Technology, my friends, for their comments and encouragement
Trang 15Tech-Chapter 1
Introduction
Network design problems are active topics in research The selection of an timal configuration or design of a network occurs in many different application contextsincluding transportation (airline, railroad, traffic, and mass transit), communication (tele-phone and computer networks), electric power systems, and oil and gas pipelines Thereare a lot of real world problems can be mapped to a formulation dealing with nodes andedges within a graph For example, telephone companies are particularly interested inminimum spanning tree, because the minimum spanning tree of a set of sites defines thewiring scheme that connects the sites using as little wire as possible It is the mother ofall network design problems This minimum spanning tree is a fundamental problem andcan be easy polynomial-time solved by using Prim or Kruskal algorithm
op-Another example concern with a traffic network whose nodes represent both origin anddestination areas for the vehicular traffic of a city and also intersections in the road net-work The arcs correspond to streets in the city, and the arc flows are the amount of traffictraversing the streets A typical network design problem would be to select a subset of thepossible road improvements subject to a budget constraint The design objective would
be to minimize the total travel cost for all travelers in the city network
It is interesting to see the wide range of network models that are related to the fixed
Trang 16charge design problem If all arc construction costs are set to zero, then the fixed chargedesign model becomes a series of shortest path problems If all arc routing costs are set
to zero, the fixed charge design model becomes a Steiner tree problem on a graph Sincethe fixed charge design problem contains the Steiner problem as a special case, we can
be confident that it is very difficult to solve If the arc construction costs are all equaland totally dominate the routing costs (i.e., the optimal network design must be a tree),then the fixed charge design problem becomes the optimum communication spanning treeproblem defined by Hu [25]
Scott [4] has introduced another network synthesis problem, called the ”optimal network”problem, that is closely related to the fixed charge design problem The arc routing costs
in this problem are all linear functions of the total flow Arc capacities, which are allinitially zero, can be raised to infinity The objective is to minimize total routing costsubject to the usual capacity and flow routing constraints and the added constraint thatthe total construction costs cannot exceed a given budget Optimal network problem is
N P − hard
In communication network design when requirements can be for example a limitation ofthe maximum communication delay or the guarantee for a minimum signal-to-noise ratio,thus, the number of relaying nodes on any path between two communication partnersneeds to be restricted This problem is the BDM ST
The BDM ST problem have so much applications in design of wire-based communicationnetworks under quality of service requirements; in linear lightwave networks, where it canminimize interference in the network by limiting the traffic in the network lines; in datacompression, where some algorithms compress a file utilizing a tree data-structure, anddecompress a path in the tree to access a record; in ad-hoc wireless networks distributedmutual exclusion algorithms More detail about the applications of the BDM ST arepresented in the next section
Trang 171.1 Motivation
BDM ST problem has applications in several areas, such as in communication work design, distributed mutual exclusion, linear lightwave networks and bit-compressionfor information retrieval In the thesis of Abdalla [1] and DI Martin Gruber [17], detailedinformations about the motivation of BDM ST are presented Additional fields of ap-plication are described in [34], where the BDM ST appears as a subproblem within thevehicle routing problem Paper [3] deals with ad-hoc wireless networks while the paper [6]presents dynamic routing algorithms for multicasting in a linear lightwave network Weconsider several applications as bellow
net-In communication network design, the requirements can be a limitation of the maximumcommunication delay or the guarantee for a minimum signal-to-noise ratio Thus, thenumber of relaying nodes on any path between two communication partners needs to belimited by a given constant
In distributed mutual exclusion, before entering a critical section a computer in a tributed environment has to signal its intention and ask for permission A relevant part
dis-of the costs for these operations is the length dis-of the longest path the messages betweenthe computers have to travel Thus, when a tree structure is used as underlying commu-nication infrastructure as proposed in [43] the diameter of it has a direct influence on theefficiency of the mutual exclusion algorithm
In distributed system, messages passe from node to other node In [43], Raymon uses alogical spanning tree structure on a network of processors Messages are passed amongprocessors requesting entrance to a critical section and processors grating the privilege toenter The maximum number of message generated per critical-section execution is 2d,where d is the diameter of the spanning tree Therefore a small diameter is essential forthe efficiency of the algorithm Minimizing edge weights reduces the cost of the network.Another application can be found in information retrieval systems where large data struc-
Trang 18tures called bitmaps are used in compressing large files, see [9] It is required to compressthe files, so that they will occupy less memory space, while allowing reasonably fast access.
In a first step similar vectors are clustered To further increase the compression rate notonly vectors within a cluster are coded relative to a representative but also the clusterrepresentatives themselves relative to each other, where the relation of the clusters is ex-pressed by a graph spanning them all Decoding process leads to the problem of creating aminimum spanning tree where the Hamming distance between two clusters is used as costfunction The length of the paths within this tree has a considerable impact on the timerequired to decompress bit-vectors part of the corresponding clusters As a consequence,there has to be a trade-off between the compression rate (costs of the spanning tree) andthe (de-)compression time (diameter of the tree)
BDM ST problem have so much applications and it is N P − hard problem SolvingBDM ST is a challenge We would like to propose the new algorithms for solving thisproblem to find good solution in reasonable time
algo-by heuristics Heuristics and especially metaheuristics can be seen as alternative whenlarge instances have to be solved in reasonable time, whereas these approaches are notable to guarantee to reach the optimum
Trang 19There are a lot of heuristic algorithms based on different approachs, such as: GreedyHeuristics, Local Search, Evolutionary Algorithms, These approaches can only appliedfor specific problems Recently, researchers use metaheurisic algorithms to design a com-putational method that optimizes a problem by iteratively trying to improve a candidatesolution with regard to a given measure of quality Metaheuristics make few or no as-sumptions about the problem being optimized and can search on a very large spaces ofcandidate solutions However, metaheuristics do not guarantee an optimal solution is everfound Examples of metaheuristic algorithms are: Iterated Local Search [31], Tabu Search[14], or Variable Neighborhood Search (V N S) [23], Simulated Annealing [30], Ant ColonyOptimization (ACO) [11], Evolutionary Algorithms (EA) [5], and Memetic Algorithms[32].
We will briefly overview Greedy heuristic algorithms, Local search, Genetic algorithmswhich we use for developping new algorithm for solving BDM ST
Greedy heuristic algorithm is an algorithm that follows the problem solving heuristic of making the locally optimal choice at each stage with the hope of finding theglobal optimum
meta-In general, greedy algorithms have five pillars:
1 A candidate set, from which a solution is created
2 A selection function, which chooses the best candidate to be added to the solution
3 A feasibility function, that is used to determine if a candidate can be used to tribute to a solution
con-4 An objective function, which assigns a value to a solution, or a partial solution
5 A solution function, which will indicate when we have discovered a complete solution.Greedy algorithms produce good solutions on some mathematical problems, but not onothers Most problems for which they work well have two properties:
Trang 20• Greedy choice property: the choice made by a greedy algorithm may depend onchoices made so far but not on future choices or all the solutions to the subproblem.
It iteratively makes one greedy choice after another, reducing each given probleminto a smaller one
• Optimal substructure: a problem has optimal substructure if the best next movealways leads to the optimal solution
Greedy algorithms mostly (but not always) fail to find the globally optimal solution,because they usually do not operate exhaustively on all the data They can make com-mitments to certain choices too early which prevent them from finding the best overallsolution later For example, all known greedy coloring algorithms for the graph coloringproblem and all other N P − complete problems do not consistently find optimum solu-tions Nevertheless, they are useful because they are quick to think up and often give goodapproximations to the optimum
Local search is a metaheuristic for solving computationally hard optimization problems.Local search can be used on problems that can be formulated as finding a solution maxi-mizing a criterion among a number of candidate solutions Local search algorithms movefrom solution to solution in the space of candidate solutions (the search space) until asolution deemed optimal is found or a time bound is elapsed
A local search algorithm starts from a candidate solution and then iteratively moves to aneighbor solution This is only possible if a neighborhood relation is defined on the searchspace As an example, the neighborhood of a vertex cover is another vertex cover onlydiffering by one node For boolean satisfiability, the neighbors of a truth assignment areusually the truth assignments only differing from it by the evaluation of a variable Thesame problem may have multiple different neighborhoods defined on it; local optimizationwith neighborhoods that involve changing up to k components of the solution is oftenreferred to as k − opt
Trang 21Figure 1.1: Scheme of genetic algorithmTermination of local search can be based on a time bound Another common choice is
to terminate when the best solution found by the algorithm has not been improved in agiven number of steps Local search algorithms are typically incomplete algorithms, asthe search may stop even if the best solution found by the algorithm is not optimal Thiscan happen even if termination is due to the impossibility of improving the solution, asthe optimal solution can lie far from the neighborhood of the solutions crossed by thealgorithms
The genetic algorithm (GA) is a search heuristic that mimics the process of naturalevolution This heuristic is routinely used to generate useful solutions to optimization andsearch problems Genetic algorithms belong to the larger class of evolutionary algorithms(EA), which generate solutions to optimization problems using techniques inspired bynatural evolution, such as inheritance, mutation, selection, and crossover
The general scheme of a GA can be given in the figure 1.1
GAs are useful and efficient when:
• The search space is large, complex or poorly understood
• Domain knowledge is scarce or expert knowledge is difficult to encode to narrow thesearch space
Trang 22• No mathematical analysis is available.
• Traditional search methods fail
Representation: Objects forming possible solution within original problem context arecalled phenotypes, their encoding, the individuals within the GA, are called genotypes.The representation step specifies the mapping from the phenotypes onto a set of genotypes.Candidate solution, phenotype and individual are used to denotes points of the space ofpossible solutions This space is called phenotype space Chromosome, and individual can
be used for points in the genotye space
Mutation Operator: It is applied to one genotype and delivers a modified mutant, the child
or offspring of it In general, mutation is supposed to cause a random unbiased change.Mutation has a theoretical role: it can guarantee that the space is connected
Crossover Operator: A binary variation operator is called recombination or crossover.This operator merges information from two parent genotypes into one or two offspringgenotypes Similarly to mutation, crossover is a stochastic operator: the choice of whatparts of each parent are combined, and the way these parts are combined, depend onrandom drawings The principle behind crossover is simple: by mating two individualswith different but desirable features, we can produce an offspring which combines both ofthose features
Parent Selection Mechanism: The role of parent selection (mating selection) is to guish among individuals based on their quality to allow the better individuals to becomeparents of the next generation Parent selection is probabilistic Thus, high quality indi-viduals get a higher chance to become parents than those with low quality Nevertheless,low quality individuals are often given a small, but positive chance, otherwise the wholesearch could become too greedy and get stuck in a local optimum
distSurvivor Selection Mechanism: The role of survivor selection is to distinguish among dividuals based on their quality In GA, the population size is (almost always) constant,
Trang 23in-thus a choice has to be made on which individuals will be allowed in the next tion This decision is based on their fitness values, favoring those with higher quality Asopposed to parent selection which is stochastic, survivor selection is often deterministic,for instance, ranking the unified multiset of parents and offspring and selecting the topsegment (fitness biased), or selection only from the offspring (age-biased).
genera-Termination Condition: Notice that GA is stochastic and mostly there are no guarantees
to reach an optimum Commonly used conditions for terminations are the following:
1 The maximally allowed CPU times elapses
2 The total number of fitness evaluations reaches a given limit
3 For a given period of time, the fitness improvement remains under a threshold value
4 The population diversity drops under a given threshold
Population: The role of the population is to hold possible solutions A population is amultiset of genotypes In almost all GA applications, the population size is constant, notchanging during the evolutional search
Both, exact and heuristic methods, have their strengths and weaknesses In practice,the combination of them to hybrid algorithms often allows to improve solution quality(faster algorithms and/or better solutions) by exploiting synergies [17] Classifications andsurveys of different hybridizations of exact optimization techniques with metaheuristics can
be found in [41, 38, 42]
Nowaday, heuristic and metaheuristic are suitable approaches for solving N P − hardproblem On this thesis, we will use local search and genetic algorithm for developpingnew algorithms for solving BDM ST problem
Trang 24Contributions will be presented in four chapters and can be summerized as follow:
1 We propose the Center-Based Recursive Clustering (CBRC) heuristic algorithm.CBRC is based on RGH (and CBT C) We extend the concept of center to each level
of the partially constructed spanning tree The algorithm can be seen as recursivelyclustering the vertices of the graph: every internal node of the spanning tree isthe center of the sub-graph in the subtree rooted at this node and we recursive tofind the best center We also survey the constraint between the weight of tree andbounded diameter We experiment and compare the result between our algorithmand others - RGH, RGH − I, CBT C, OT T C, CBT C − I - on the Euclidean andNon-Euclidean instances up to 1000 vertices On the Euclidean instances, the resultsshow the effectiveness of our algorithms on the best, mean and deviation values Onthe Non-Euclidean instances, the best results found by CBRC − I are the same withthe one found by OT T C
2 We also introduce three multi-parent recombination operators in genetic algorithmfor solving BDM ST problem We consider three different methods for choosingparents: the first one is based on Levenshtein distance between the parents, the sec-ond one uses the best individual in the population and the last one uses randomlychosen individual in the population We also experiment each method of choosing
Trang 25parents with three ways for adding edges from the parents into the offspring: choosethe edge randomly, choose the edge which have minimum weight, choose the edgewhich have minimum weight in maximum sharing edge from the parents We exper-iment on the Euclidean instances up to 1000 vertices We concentrate on analyzingthe recombination operator in genetic algorithms So we compare the results of ouralgorithms using, respectively, three mentioned multi-parent recombination opera-tors with another genetic algorithm using two-parent recombination operator on thesame problem.
3 We propose a new hybrid genetic algorithm for solving BDM ST problem The newgenetic algorithm uses multi-population, where each population is initialized with adifferent well known heuristic The individuals in each population will subsequentlycompete for positions in a selection population, using a simulated annealing mecha-nism based on proportionate selection; in the selection population, they will combineand evolve toward the optimum Therefore, our research approaches employ differ-ent initial biases by using different heuristics for initialization, and to hybridize theindividuals from these populations to promote the exploratory capacity of the GA
We compare our results with other genetic algorithms, namely, the genetic algorithm
in [40] of Raidl and Julstrom (called RJ − ESEA), the genetic algorithm of Alokand Gupta in [46] (called P EA − I) and the genetic algorithm in each population onthe Euclidean and Non-Euclidean instances up to 1000 vertices The results showthe effectiveness of our algorithm
4 We propose steady-state genetic algorithms which use different heuristic algorithmsfor decoding We modify the decoder and the replacement policy used in P EA − I
so as to improve its performance We use four decoders by different well-knownheuristic algorithms: RGH, RGH − I, CBRC, CBRC − I We experiment on theEuclidean and Non-Euclidean instances up to 1000 vertices and the results show the
Trang 26outperform of our algorithms than the others.
1.5 Outline
This dissetation is organized as follow
In chapter 1, we introduce the motivation of the thesis, methodologies Scope of researchesand contributions are also presented
After the introduction, chapter 2 present formulation of the BDM ST problem and marize the related works in the field of the BDM ST problem To our best knowledge, all
sum-of the algorithms for solving BDM ST only suitable for one kind sum-of the problem instance:Euclidean or Non-Euclidean instances So, in the remain chapters, we will present ouralgorithms for solving BDM ST We hope that our propose algorithms can be applied forboth Euclidean and Non-Euclidean instances to find better solution
A new greedy heurisitic algorithm (Center-Based Recursive Clustering) is presented inchapter 3
Evolutionary algorithms have proven effective on several hard spanning tree problems So,
in the chapter 4, 5, 6, we present our genetic algorithms for solving BDM ST
An EAs recombination operator should provide strong heritability This means that thetree produced by recombining parent trees should consist mostly of parental edges It isalso beneficial to favor edges that are common to the parents In the chapter 4, we presentmulti-parent recombination operator in genetic algorithm for solving BDM ST
Almost all genetic algorithms for solving the BDM ST problem strongly depend on theirparticular heuristics, in that the heuristics were usually used to initialize GA populationsand played an important role in the design of genetic operators However, it has beensuggested in the literature that the behaviours of different heuristics vary over differentclasses of problem instances [46]
In chapter 5, we introduce a new hybrid genetic algorithm for solving BDM ST problems
Trang 27that uses a multi-population, where each population is initialized with a different wellknown heuristic Chapter 5 presents new hybrid multi-population genetic algorithm inwhich each population is initialized with a different well know heuristic Chapter 6 willintroduce steady-state genetic algorithm for solving BDM ST problem which uses differ-ent heurisitics for decoding the tree.
Finally, the conclusion summarizes the works
Trang 28Chapter 2
Bounded Diameter Minimum
Spanning Tree and Related Works
This chapter presents the formulation of BDM ST and summarizes the related works inthe field of the BDM ST problem
Before introduce the approaches for solving BDM ST , we state the problem
2.1 Problem formulation
We need to introduce some concepts relating to tree diameter and center before theBDM ST problem can be formally stated
Let T = (V, ET) be a tree with node set V and edge set ET
Definition 1: (Eccentricity) The eccentricity of a node v ∈ V is defined as the maximumnumber of edges on the path between v and any other node within the tree T
Definition 2: (Diameter) The diameter of a tree T , denoted as diam(T ), is the maximaleccentricity over all nodes in T (i.e the length of maximal path between two arbitrary ver-tices in T )
Definition 3: (Center of tree) The center of a tree is the single vertex (if the diameter of
Trang 29the tree is even) or the two connected vertices (if the diameter is odd) of minimum tricity Suppose that a diameter of the tree is defined by the path v1, v2, , v[k
2], v[k
2]+1) is called a center edge
Definition 4: (Radius) The radius of a tree is the minimum eccentricity among all nodes
of the tree
Definition 5: (Bounded Diameter Minimum Spanning Tree Problem - BDM ST ) Let
G = (V, E) be a connected undirected graph with positive edge weights w(e) The BDM STproblem can be formulated as follows: among all spanning trees of G whose diameters donot exceed a given upper bound D ≥ 2, find the spanning tree with the minimal cost (sum
of the weights on edges of the tree) As in almost all studies of the BDM ST problem, andwithout lost of generality, we will assume that G is a complete graph
Thus, we can formulate the problem as:
Find a spanning tree T = (V, ET) of G that minimizes
so the center of tree is only one vertex In figure 2.2, the bounded diameter is odd number,
so v1, v2 are the centers of tree and (v1, v2) is center edge
Definition 6: (Decision BDM ST problem) Let G = (V, E) be a connected undirectedgraph with edge weights are 0 or 1 and two intergers D ≥ 2 and q ≥ 2 Does exit aspanning tree with diameter less than or equal D and the weight of tree is q?
Trang 30Figure 2.1: The BDST with 19 vertices and
bounded diameter D=4, v is the center of the
in O(n2) (D = 2), respectively by iterating over all edges and connecting the remainingnodes in time O(m.n) (D = 3), which is bounded above by O(n3) for complete graphs Incase, 4 ≤ |V | − 1, BDM ST become N P − hard problem Detail about special cases with
D < 4 can be seen in [16] Reduction of BDM ST is introduced in [13, 17]
2.2 Related Optimization and Decision Problems
Some of the well-known constrained minimum spanning tree problems require imizing the weighted diameter of the spanning tree of a randomly-weighted graph Theseproblems are closely related to the problems that require optimizing the weighted radius
min-of the spanning tree The main difference between these problems and the BDM ST lem lies in the way they disregard the number of edges in the longest path in the tree.Approaches to solve these problems can be sometimes modified to solve the BDM ST
Trang 31prob-problem, and vise versa In this section, we introduce some optimization and decisionproblems concern with BDM ST
Let G = (V, E) be a connected undirected graph with positive edge weights w(e) Suppose
T = (V, ET) be a spanning tree of G
Problem 1: Bounded Weighted Diameter Minimum Spanning Tree problem (BW DM ST )Among all spanning trees of G whose weight of diameters do not exceed a given upper bound
D, find the spanning tree with the minimal cost
Problem 2: Minimum Weighted Diameter Bounded Spanning Tree problem (M W DBST )Among all spanning trees of G whose weight of tree do not exceed a given upper bound S,find the spanning tree with the minimal weighted diameter
Problem 3: Bounded Weighted Radius Minimum Spanning Tree problem (BW RM ST )Among all spanning trees of G whose weight of radius do not exceed a given upper bound
R, find the spanning tree with the minimal cost
Problem 4: Minimum Weighted Radius Bounded Spanning Tree problem (M W RBST )Among all spanning trees of G whose weight of tree do not exceed a given upper bound S,find the spanning tree with the minimal weighted radius
Problem 5: Bounded Weighted Diameter Bounded Spanning Tree problem (BW DBST )Among all spanning trees of G whose weight of diameters do not exceed a given upperbound D, find the spanning tree with the weight of tree do not exceed a given upper boundS
Problem 6: Bounded Weighted Radius Bounded Spanning Tree problem (BW DBST )Among all spanning trees of G whose weight of radius do not exceed a given upper bound
R, find the spanning tree with the weight of tree do not exceed a given upper bound S.Two applications closely related to BDM ST problem are mentioned bellow
Problem 7: Hop Constraint Minimum Spanning Tree Problem (HCM ST ) Given a graph
G = (V, E) with positive edge weight w(e) A root r and a hop limit H Find spanningtree T = (V, ET) of G that minimal cost and each path in T from r to any other node
Trang 32consists of no more than H edges.
Generalize of HCM ST can be defined as follow:
Problem 8: Distance or Delay Constrained Minimum Spanning Tree Problem Given agraph G = (V, E) with positive edge weight w(e) and delay value de ≥ 0 A root r and abounded delay L Find spanning tree T = (V, ET) of G that minimal cost and the delay ofall edge in the path from r to other node less than L
Three other bellow problems are constraint optimization problems concern to spanningtree
Problem 9: k − Cardinality Tree Problem Given an undirected graph G = (V, E) withedge weights and a positive integer number k, the k − Cardinality Tree problem consists
of finding a subtree T of G with exactly k edges and the minimum possible weight
Problem 10: Degree-Constrained Minimum Spanning Tree Problem Let G = (V, E) be aconnected undirected graph with positive edge weight w(e) DCM ST can be formulated asfollows: among spanning trees of G whose degree is not exceed a given upper bound d ≥ 2,find the spanning tree with minimum cost
Problem 11: Capacitated Minimum Spanning Tree Problem Given an undirected weightedgraph G, a node r of G and in integral value Q, CM ST consists of finding a minimumspanning tree T of G rooted at r such that the number of nodes of each subtree of T doesnot exceed Q
All of above problems are N P − hard and can be seen in [24]
In the next section, we will review the approaches for solving BDM ST
2.3 Related works
The BDM ST problem has been shown to be also approximate-hard, in that there
is no polynomial time algorithm which could guarantee to find a solution that has a costwithin log(|V |) of the optimum, unless P = N P Techniques for solving the BDM ST
Trang 33problem may be classified into two main categories: exact methods and inexact (heuristic)methods Exact algorithms are guaranteed to find an optimal solution The run-timeincreases dramatically with the instance size, and often only apply for small instances.Heuristic algorithms will be used for larger instances and it guarantee to find good solutions
in a limited time
2.3.1 Exact approaches
Exact approaches for solving the BDM ST problem are based on mixed linearinteger programming [35, 15] Achuthan et al [35] presented three branch-and-boundalgorithms for it and solved instances with up to 100 vertices Gouveia and Magnanti[15] described a network flow model that solved instances with up to 100 vertices and1,000 edges, and Santos et al [12] extended the methods of Achuthan et al [1994] Theypresented a formulation based on lifted Miller-Tucker-Zemlin inequalities responsible foravoiding cycles and ensuring the maximum diameter This approach is suitable for densegraphs but take so much time and could not deal with large size problem instances.More recently, Gruber and Raidl suggested a branch and cut algorithm based on compact0-1 integer linear programming [19] It is further strengthened by dynamically adding vi-olated connection and cycle elimination constraints within a branch-and-cut environment.They model BDM ST problem into two cases: even diameter and odd diameter and solve
it seperately They experiment on the graph with maximum |V | = 40 and |E| = 200.However, being deterministic and exhaustive in nature, exact approaches could only beused to solve small problem instances (e.g complete graphs with less than 100 nodes)
2.3.2 Heuristic Methods
Since exact algorithms are not able to solve the instances with thousands of nodes,heuristics have been developed We briefly summerize some construction heuristic algo-rithms which can solve for the instances up to thousands of nodes
Trang 342.3.2.1 One Time Tree Construction Algorithm
Abdalla et al [2] presented a greedy heuristic algorithm, the One Time Tree struction (OT T C) for solving the BDM ST problem OT T C is based on Prims algorithm
Con-in [37] It starts with a set of vertices, Con-initially contaCon-inCon-ing a randomly chosen vertex.The set is then repeatedly extended by adding a new vertex that is nearest (in cost) tothe set, as long as the inclusion of the new node does not violate the constraint on thediameter of the tree The algorithm time for appending each new edge, in the worst case,
is O(n2) This step is repeated n − 1 times, so the algorithm time is O(n3) The quality
of the tree indentified by the algorithm depends heavily on the start vertex To identify
a low-weight BDST , the algorithm should be run starting from each vertex in the targetgraph The time of the entire process is then O(n4) This algorithm is time consuming,and its performance is strongly dependend on the starting vertex
Figure 2.3 shows a smallest BDST found by OT T C, of diameter D = 5 on n = 100
Figure 2.3: The best BDST found by OT T C algorithm on the Euclidean problem instancewith n = 100, D = 5
vertices in the unit square Short edges connect only a few vertices near the center ofthis tree The remaining vertices connect via longer edges to this core, forming a star-likestructure
Trang 352.3.2.2 Center-Based Tree Construction Algorithm
In [28], the Center-Based Tree Construction Heuristic (CBT C) applies the samePrim-based strategy but uses the start vertex as the center of the spanning tree (if D iseven) or as one of two vertices in the center (if D is odd) This algorithm does not need
to bound each vertex eccentricity It suffices to bound each vertex’s depth by the number
of edges on the path from the tree’s center to the vertex No vertex can be more than
D
2 edges from the center, and the depth thus the eligibility of a vertex is fixed when itjoins the tree Updating this algorithm data structures requires only linear time in theworst case (constant time when a new vertex depth is D2 ), so the time complexity ofthe algorithm is O(n2) and O(n3) if starting at each vertex
Julstrom also modified CBT C algorithm by choosing the starting vertex and all quent vertices at random from those not yet in the spanning tree The connection ofeach new vertex v to the tree remains greedy It always uses the lowest-weight edge thatconnects v to a vertex in the tree whose depth is less than D2 The modified algorithmcalled Randomized center-based Tree Construction (RT C) The time complexity of RT C,like that of CBT C, is O(n2) Running the randomized heuristic n times and reportingthe best solution is thus O(n3)
subse-2.3.2.3 Randomized Greedy Heuristic Algorithm
Raidl and Julstrom proposed in [40] a modified version of OT T C, called domized Greedy Heuristics (RGH) RGH starts from a centre by randomly selecting avertex and keeping it as the fixed center during the search It then repeatedly extendsthe spanning tree from the center by adding a randomly chosen vertex from the remainingvertices, and connecting it to a vertex that is already in the tree via an edge with thesmallest weight
Ran-The algorithm also differ from OT T C in that it begin by fixing the center of the tree.The starting vertex v0 is chosen randomly If D is even, v0 is the center If D is odd,
Trang 36another vertex v1 is chosen at random and v0, v1 are the centers; the edge joining them
is the first in the tree Instead of maintaining the eccentricity of vertex and path lengthsbetween vertices, the randomized heuristic stores the depth of each connected vertex: thenumber of edges on the path from it to the center This value is set when a vertex joinsthe tree and does not subsequently change No vertex may have a depth greater thanD
2;otherwise the diameter constraint is violated or v0(v1) is displaced from the center.Sketch of RGH algorithm can be presented in the algorithm 1
Identifying the vertex u ∈ C that is nearest to v requires time O(|C|) = O(n) ThisAlgorithm 1 Randomized Greedy Heuristic Algorithm
v ←a random vertex from U ;
u ←vertex from C with smallest w((u, v));
Trang 372.3.2.4 Improved Greedy Heurisitics (RGH − I and CBT C − I)
Singh and Gupta [46] extended greedy constructive heuristic with a local searchstep that reevaluate previous vertex connections after appending each new vertex
They check for each vertex v if it can be connected to a better parent vertex other thanthe one to which it is currently connected without violating the diameter constraint Thevertex, which offers the maximum reduction in the cost of BDST is selected and wholesubtree rooted at vertex v is deleted from its current location and reconnected to the treevia the vertex selected
This improvement is applicable to CBT C also and the obtained algorithm will be denoted
by CBT C − I
Figure 2.4 and 2.5 show the best BDST found by CBT C algorithm on the Euclidean
Figure 2.4: The best BDST found by CBT C
algorithm on the Euclidean problem instance
with n = 100, D = 10
Figure 2.5: The best BDST found byCBT C − I algorithm on the Euclidean prob-lem instance with n = 100, D = 10
instance with the number of vertices is 100 and D = 10 respectively The tree on thefigure 2.5 found by apply the local search on the best tree found by CBT C algorithm(figure 2.4) and can be seen on the circle mark
Figure 2.6 and 2.7 show the best BDST found by RGH on the Euclidean probleminstance with n = 100, D = 10 respectively The tree on the figure 2.7 found by applythe local search on the best tree found by RGH algorithm (figure 2.6) and can be seen
Trang 38Figure 2.6: The best BDST found by RGH
algorithm on the Euclidean problem instance
with n = 100, D = 10
Figure 2.7: The best BDST found byRGH − I algorithm on the Euclidean prob-lem instance with n = 100, D = 10
on the circle mark Singh and Gupta [46] experiment on the Euclidean instances with thenumber of vertices are 50, 100, 250, 500 and 1000 diameter bound is set to 5, 10, 15, 20,
25 respectively
2.3.2.5 Hierarchical clustering heuristic algorithm - HCH
In [21], Gruber and Raidl propose a constructive heuristic that exploits a chical clustering to guide the process of building a backbone The clustering heuristicconstructs diameter constrained trees within three steps: determining a hierarchical clus-tering, reducing the height of this clustering according to the diameter bound, and finallyderiving a BDM ST from this height-restricted clustering
hierar-They experiment on the Euclidean instances from Beasley’s OR-Library [7] |V | = 1000and 15 first instances are used On large Euclidean instances the BDM ST s obtained
by the HCH outperforms other construction heuristics significantly, especially when thediameter bound is tight and it takes only few seconds but it can not apply to the Non-Euclidean instances
Trang 392.3.2.6 Comments
In Singh and Gupta [46], they experiment and compare the result between OT T C,RGH, RGH − I, CBT C, CBT C − I on the Euclidean and Non-Euclidean instances inwhich the number of vertices are 50, 100, 250, 500, 1000 and the diameter bound is set to
5, 10, 15, 20, 25 respectively The experimental results show that:
On the Non-Euclidean instances, RGH − I and CBT C − I give better results than RGHand CBT C, respectively on the best and average results Both RGH and RGH−I performmuch worse than OT T C, CBT C and CBT C − I Even RGH − I cannot compete with
OT T C, CBT C and CBT C − I On almost instances, OT T C gives the best results onthe min, mean value
In [28], Julstrom experimsent on 240 graphs, 120 Euclidean and an equal number withedge weights chosen at random The Euclidean graphs consisted of points randomly placed
in the unit square, 30 graphs each of n = 100, 250, 500, and 1,000 points In each set ofgraphs, 15 instances can be found in OR-library [7], where they are listed as instances ofthe Euclidean Steiner problem, and 15 more were randomly generated In each set, thepoints are the vertices of complete graphs whose edge weights are the Euclidean distancesbetween the points
Four more sets of 30 complete graphs also consisted of n = 100, 250, 500, and 1,000vertices The edge weights of these graphs were chosen at random on the interval [0.01,0.99]
On the Euclidean instances, diameter bound is set to 5, 10, 15, 25 for |V | = 100, 10, 15,
20, 40 for |V | = 250, 15, 30, 45, 60 for |V | = 500, 20, 40, 60, 100 for |V | = 1000 Onrandom edge weight instances, diameter bound is set to 5, 7, 10, 15 for |V | = 100, 5, 10,
15, 20 for |V | = 250, 10, 15, 20, 30 for |V | = 500, 10, 20, 30, 50 for |V | = 1000
The experimental results on [28, 46, 40] show that:
On the Euclidean instances, the best and average results found by RGH − I are betterthan RGH, OT T C, CBT C and CBT C − I When D is small, CBT C identifies BDST s
Trang 40that are slightly shorter than those OT T C finds, but RT C trees are much shorter thanthose of OT T C and CBT C When OT T C, CBT C are applied to problem instances whosevertices are points in Euclidean space and whose edge weights are the distances betweenthe points, the weight of BDM ST found by the heuristic are much larger than minimum,especially in the case D is smaller than n OT T C and CBT C build backbones of shortedges; the remaining points connect to these backbones via longer edges, so OT T C andCBT C build longer trees than necessary This observation holds for almost all BDM STproblem instances With larger diameter bounds, the differences in the three algorithmsresults diminish, to the particular advantage of CBT C.
On random weight instances, CBT C identifies have on average lower weights than those
OT T C RT C is always worse than that of both OT T C and RT C The lack of Euclideanstructure in the random-weight instances make OT T C and CBT C better than RT C
2.3.3 Metaheuristic algorithms
Beside the greedy construction heuristics, several research groups have developedevolutionary algorithms (EAs) for solving the BDM ST and hope that they could findgood result within reasonable time
In EA, representation methods are important role and decide all the operator in thealgorithm
Representation methods: There are a lot of methods for representing individuals, especiallyspanning tree: Characteristic vectors, Predecessor coding, Prufer number, Link and node,Edge-set-encoding, Permutation code In this thesis, we will use Edge-set-encoding andPermutation code
• Edge-set-encoding: The problem of spanning tree representation has been studiedextensively in the literature References in [36, 26] and specially [45] contain sub-stantial discussions and analysis of different representations from theoretical andpractical perspectives For the BDM ST problem, three representations have been